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ABSTRACT 


The  report  covers  the  on-line  computing  system  development  from  i960 
through  1970.  It  includes  a  general  resume  of  progress  through  December, 

1969  and  a  detailed  progress  from  then  through  June  30,  1970.  The  improved 
version  of  the  on-line  system  substantially  improves  system  reliability  and 
presents  users  new  options.  Significant  progress  in  speech  analysis/synthes.' s 
project  includes:  improved  techniques  for  deriving  accurate  data  from  ASCON 
parameters,  good  results  from  the  steady-state  vowel  recognizer,  and  one-pass 
analysis  and  synthesis.  The  1800  has  been  improved  so  that  it  is  a  more 
effective  research  tool  supporting  the  speech  effort. 
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PART  I 

Introduction 

This  final  report  consists  of  a  rather  general  resume  of  technical 
progress  which  has  previously  been  reported  in  detail  during  the  period 
April  26,  i960  through  December  2,  1969*  Following  this  resume  is  a 
detailed  report  covering  the  period  from  December  3,  19&9  through 
June  30 ,  1970,  the  final  reporting  period  for  the  current  contract.  From 
the  report  it  will  be  clear  that  further  research  is  indicated.  Continu¬ 
ation  of  this  research  will  be  accomplished  under  Contract  AF19628-70-C-0314 
commencing  July  1,  1970. 

List  of  Scientists  and  Engineers  Contributing  to  the  Research 

Dr.  Glen  J.  Culler 
Dr.  David  0.  Harris 
Dr.  James  A.  Howard 
Dr.  Roger  C.  Wood 
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Mr.  Ronald  Stoughton 

List  of  Publications  and  Reports  Resulting  from  Sponsorship  of  the  Contract 
Publications 

1.  Baldwin,  Jr.,  John  A.,  and  Glen  J.  Culler,  "Wall- Pinning  Model  of  Magnetic 
Hysteresis",  Journal  of  Applied  Physics,  Vol.  kO,  No.  7,  June  1969*  • 

pp.  2828  -  2835. 


2.  Bruch,  Jr.,  John  C.  and  Roger  C.  Wood,  "The  Teaching  of  Hydrodynamics 
Using  Computer  Generated  Displays",  Bull.  Mech.  Sngng.  Educ.,  Vol.  1, 

Pergamon  Press  1962,  pp.  1  -  11. 

3.  Culler,  Glen  J.,  "Mathematical  Laboratories:  A  New  Power  for  the 
Physical  Sciences",  Interactive  Systems  for  Exp.  Applied  Mathematics, 
Academic  Press  Inc.,  New  York,  1968,  pp.  355-  384. 

4.  Cul?er,  Glen  J. ,  "On  the  Polar  Equations  for  Linear  Systems  and  Related 
Nonlinear  Matrix  Differential  Equations",  Transactions  of  the  American 
Mathematical  Society,  Vol.  118,  Issue  6,  June,  1965,  PP*  390-405* 

5.  Davenport,  Demorest,  Glen  J.  Culler,  Richard  B.  Forward,  and  William 
G.  Hand,  "The  Investigation  of  the  Behavior  of  Microorganisms  by 
Computerized  Television",  IEEE  Transactions  on  Bio-Medical  Engineering, 

Vol.  BME-17,  No.  3,  July,  1970,  pp.  230-237* 

6.  Hendren,  Philip,  Experiments  in  Forms,  Using  Computer  Graphics,  Sept., 

1968. 

7.  Howard,  James  A.,  and  Keith  L.  Doty,  UCSB  On-Line  System  Manual,  Feb., 

1969. 

8.  Howard,  James  A.,  Roger  C.  Wood,  "Hybrid  Simulation  of  Speech  Waveforms 
Utilizing  a  Gaussian  Wave  Function  Representation",  Simulation,  Sept., 

1968,  pp.  117-124. 

9.  Wood,  Roger  C.  and  Philip  Hendron,  "A  Flexible  Computer  Graphic  System 
for  Architectural  Design",  Information  Display,  March/April,  1968, 

pp.  35-40. 

Reports 

1.  First  Quarterly  Report,  Reporting  Period: April  16,  1966  -  July  15,  1966. 

2.  Second  Quarterly  Report,  Reporting  Period:  July  16,  1966  -  October  15,  1966. 

3.  Third  Quarterly  Report,  Reporting  Period:  October  16,  .1966  -  Jan.  15,  1967* 

4.  Fourth  Quarterly  Report,  Reporting  Period:  Jan  l6,  1967  -  April  15,  1967* 

5.  Fifth  Quarterly  Report,  Reporting  Period:  April  16,  1967  -  July  15,  1967. 
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o.  Sixth  Quarterly  Report,  Reporting  Period:  July  l6  -  October  15,  196?. 

7.  Seventh  Quarterly  Report,  Reporting  Period:  Oct.  l6,  1967  -  January  15,  1968. 

8.  Eighth  Quarterly  Report,  Reporting  Period:  Jan.  16,  1968  -  April  15,  I968. 

9.  Ninth  Quarterly  Report,  Reporting  Period:  April  16,  1968  -  July  15,  1968. 

10.  Tenth  Quarterly  Report,  Reporting  Period:  July  16,  1968  -  Oct.  15,  1968. 

11.  Eleventh  Quarterly  Report,  Reporting  Period:  Oct.  16,  1968  -  Jan.  15,  1969- 

12.  Twelfth  Quarterly  Report,  Reporting  Period:  Jan.  16,  1969  -  April  15,  1969- 

13.  Thirteenth  Quarterly  Report,  Reporting  Period:  April  16,  1969  -  July  15,  1969- 
lU.  Semiannual  Technical  Report,  Reporting  Period:  June  2,  1969  -  Dec.  2,  1969. 


Related  Research  -  List  of  Previous  and  Related  Contracts 

The  development  of  on-line  computation  at  the  University  of  California 
at  Santa  Barbara  was  initiated  with  the  delivery  of  a  gift  from  the  Bunker- 
Ramo  Corporation  consisting  of  one  Teleputer  Control  unit,  one  Data  Set 
Control  unit,  and  one  storage  tube  display  device.  This  was  used  to  carry  out: 

NONR  4222(09): 

Pilot  Experiment  -  Our  pilot  experimental  program  consisted  of  utilizing 
the  Teleputer  console  which  was  donated  by  the  Bunker-Ramo  Corporation  located 
in  the  Computer  Center  of  the  University  of  Calif ornai  at  Santa  Barbara  but 
tied  to  a  leased  telephone  line  feeding  into  the  RW  4-00,  AN/SFQ  27  equipment 
at  the  Bunker-Ramo  Corporation  in  Canoga  Park.  Through  this  program  we 
demonstrated  that  an  adaquate  curvilinear  display  was  possible  over  a  con¬ 
ventional  201B  data  set.  We  developed  the  basic  software  underlying  our 
present  on-line  system. 

ARPA  SD  319: 

An  Experimental  Communication  Laboratory  -  We  designed  and  constructed 
a  16  station  computer  classroom  and  the  associated  time-sharing  software 
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which  is  now  being  used  by  the  Electrical  Engineering  Department,  the 
Mathematics  Department,  the  Chemistry  Department,  and  long  line  at  Harvard 
Computation  Laboratory,  the  University  of  California,  Los  Angeles  Physics 
Department,  the  University  of  Kansas,  and  at  the  Livermore  Radiation 
Laboratory  in  Livermore,  California. 

NSF  GP  5382: 

Mathematical  Applications  of  On-Line  Computation  -  We  designed  and 
constructed  a  logical  interface  connecting  the  IBM  l?ii  Model  1  disk  drive 
to  our  on-line  system.  We  adapted  our  on-line  software  to  include  external 
users.  We  initiated  mathematical  research  in  the  areas  of  non-linear 
integral  equations  and  complex  function  theory. 

NSF  GJ  115  and  GJ  693: 

Development  of  an  on-line  computer  network  for  Chemistry  Education  - 
This  network  ties  none  other  universities  from  across  the  nation  into  the 
UCSB  On-Line  Computer  System.  The  first  station  was  operational  in  March, 
1970.  To  date  results  of  this  network  have  been  extremely  successful. 

Resume  of  Technical  Progress 

Work  under  the  contract  commenced  April  26,  1966.  Detailed  technical 
progress  has  previously  been  reported  as  indicated  in  the  prior  listing  of 
reports . 

The  general  purpose  of  the  research  was  to  develop  on-line  computing. 
Specific  tasks  were  to  develop  a  modern  computing  system,  establish  a 
campus  network,  enhance  human- computer  communications  and  establish  a 
national  network  for  appropriate  institutions.  The  resume  will  discuss 
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progress  in  each  of  these  task  areas. 

Software  was  written  to  effectively  share/transfer  control  between 
the  on-line  system  and  standard  batch  processing.  In  addition  sub-programs 
were  developed  for  the  vast  number  of  macros  required  for  effective  use  of 
the  system.  As  a  normal  development  progressed  the  operating  system  changed 
from  DOS  to  OS.  Various  languages  were  added  to  the  system  to  enhance  user 
options  and  make  the  system  easier  from  the  user  point  of  view.  Special 
features  include  entering  and  manipulating  jobs  in  the  batch  mode  from 
remote  terminals  (RJE) .  To  improve  system  reliability  and  make  the  system 
more  exportable  a  new  version  of  the  system  software  was  developed  and 
installed  during  the  final  year  of  the  contract. 

Hardware  developments  have  included  developing  the  On-Line  Computing 
System  within  the  360/50  which  was  replaced  by  the  360/65,  which  was  replaced 
by  the  current  360/75.  Network  activity  has  grown  from  a  small  nucleus  of 
campus  terminals  to  a  national  network  supported  by  an  NSF  grant  and  includes 
preliminary  operations  of  the  still-growing  ARPA  network.  To  support  the 
networks  the  necessary  interfaces,  buffer  and  multiplexor  have  been  developed. 
Within  the  Electrical  Engineering  Department  a  computer  classroom  has  been 
established  consisting  of  sixteen  stations,  and  several  smaller  classrooms 
have  also  been  installed  elsewhere  on  campus.  Peripheral  hardware  has  included 
development  of  the  double  keyboard,  a  blackboard  plotter,  a  system  employing 
a  Grafacon  tablet,  a  high  speed  buffer,  and  a  bugwatcher  to  facilitate  use 
of  the  computer  in  support  of  bio-medical  engineering  research. 

The  objective  of  the  speech  project  is  to  establish  effective  human 
cccrrunications  with  the  computer.  Early  efforts  have  been  devoted  to 
identifying  the  various  elements  of  human  speech  and  analyzing  those  that 
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could  be  useful  in  this  communication.  Research  was  fruitful  in  that  the 
concept  of  the  waveform  analysis  and  synthesis  approach  has  been  fully 
developed.  The  parameters  have  been  described,  techniques  and  procedures 
outlined  and  essential  hardware  obtained  to  test  the  fundamental  elements 
of  speech  sounds  and  to  provide  clear  reproduction  of  these  sounds.  As  a 
natural  outgrowth  of  the  continuing  research  effort,  early  techniques, 
procedures  and  hardware  will  be  modified  'to  enhance  the  reliability,  improve 
efficiency,  and  develop  applications. 


PART  II  -  Detailed  Technical  Progress  for  the  Period  January  -  June,  1970. 

Synopsis 

Network  software  development  has  progressed  in  consonance  with  network 
protocol  development. 

A  new  version  of  the  basic  system  software  was  virtually  completed 
during  this  reporting  period.  Operational  testing  under  the  rigors  of 
normal  user  activity  will  commence  on  July  1,  1970. 

Hardware  development  includes  the  Multi-Teletype  Control  prototype 
and  the  High  Speed  Bata  Buffer.  Both  units  adhere  tc  the  concept  of 
connecting  directly  to  the  3^0  without  going  through  the  UCSB  Buffer  unit. 

The  Multi- Line  Controller  is  in  the  design  phase  and  should  be  developed 
and  operational  during  the  next  reporting  period. 

Significant  progress  has  been  made  in  the  speech  analysis /synthesis 
project.  This  includes:  (l)  Development  of  a  one-pass  analysis /synthesis 
system  which  substantially  increases  data  accuracy.  (2)  Improved  techniques 
for  deriving  reliable  recognition  information  from  ASC/6N  parameters.  (3) 
Achievement  of  good  results  from  the  steady- state  vowel  recognizer.  The 
entire  speech  project  has  been  enhanced  considerably  by  improvements  to 
the  1800  which  is  now  a  more  useful  tool  with  the  capability  of  handling  a 
substantial  portion  of  the  computing  required  in  the  continuing  research  effort. 
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Technical  Findings  and  Major  Accomplishments 

A.  Software 

(L)  ARPA  Network 

As  a  result  of  network  protocol  modifications  it  was  necessary  to  set 
aside  previous  work  on  the  network  software  and  virtually  start  over  again 
on  May  1,  1970.  In  planning  the  new  implementation,  it  was  decided  to  make 
it  independent  of  the  On-Line  System,  giving  batch-mode  tasks  (as  well  »  . 

On-Line  System  users)  direct  access  to  the  Network. 

In  the  new  implementation,  tasks  communicate  with  the  Network  Control 
Program  (NCP)  by  means  of  a  supervisor  call.  One  supervisor  call  routine 
suffices  to  perform  all  necessary  network  functions;  a  branch  index  passed 
to  the  routine  specifies  the  desired  function.  In  general,  the  supervisor 
call  routine  returns  control  to  the  invoking  task  upon  initiation  of  the 
operation.  Hie  operation  is  completed  by  the  I/O  interrupt  handler,  which 
posts  the  event  complete  associated  with  the  requesting  task.  This  method 
of  signaling  the  completion  of  an  event  was  chosen  as  a  powerful  alternative 
to  "blocking"  the  task  until  completion  (as  proposed  in  the  Network  literature), 
and  makes  feasible  the  eventual  use  of  the  supervisor  call  by  subroutines  of 
the  On-Line  System. 

Currently,  the  NCP  runs  as  a  normal  task  in  batch  mode  under  HASP  and 
the  Operating  System.  Upon  start  of  execution,  an  initialization  routine, 
by  making  the  necessary  modifications  to  low  core,  initiates  the  NCP  as  both 
the  I/O  and  supervisor  cal]  first-level  interrupt  handlers  (FLIP),  permitting 
it  to  (l)  process  I/O  interrupts  from  the  BP,  and  (2)  gain  control  when 
the  Network  supervisor  call  is  issued  by  any  task  in  the  machine.  Should  the 
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NCP  abnormally  terminate,  low  core  is  returned  to  its  original  state,  and  OS's 
FLTK's  are  reinstated.  In  this  manner,  the  software  can  be  developed  and 
debugged  with  no  modifications  to  the  operating  system.  Once  it  has  reached 
operational  status,  the  NCP  can  either  be  made  a  resident  part  of  OS,  or  run 
as  an  extension  of  the  Logger  using  the  present  technique. 

At  present,  those  routines  which  transfer  data  between  sockets  (READ  and 
WRITE)  are  operational,  and  transfers  between  processes  in  the  360/75  have 
been  made  using  supervisor  calls.  With  the  adoption  of  an  official  Host-Host 
protocol  scheduled  for  July  13,  1970,  those  routines  which  establish,  switch, 
and  break  connections  will  be  developed. 

One  significant  change  to  the  improved  system  software,  was  to  include 
trailing  predicates  in  the  new  version.  This  reverses  the  action  intended 
when  the  last  technical  report  was  written.  Rationale  for  including  trailing 
predicates  was  primarily  based  upon  the  fact  that  numerous  users  were  employing 
this  feature  and  to  eliminate  it  would  create  a  great  deal  of  anguish  among 
these  users.  It  was  felt  the  slight  additional  software  overhead  would  be 
much  easier  than  re-educating  the  entire  user  population. 

Development  of  the  new  version  is  virtually  completed.  Many  elements 
have  been  satisfactorily  checked  by  development  personnel.  These  verifications 
foster  optimism;  however  the  acid  test  will  come  with  the  deluge  of  programs 
and  operations  of  normal  system  use  by  our  user  group.  The  target  date  for 
normal  operational  use  of  the  new  system  is  July  1,  1970,  Because  of  this 
fact,  the  software  portion  of  this  report  is  abbreviated  -  more  extensive 
details  will  be  included  in  the  next  report  after  the  new  system  has  been  in 
operational  use. 
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B.  360/7$  On- Line  System,  Hardware 

In  the  last  technical  report  a  new  direction  for  system  development 

was  set  forth.  Development  was  to  proceed  away  from  further  attachment  of 

special  devices  to  the  existing  UGS3  Buffer  and  toward  directly  addressable 

I/O  devices  attached  to  the  Multiplexor  Channel  of  the  System/360.  The 

attachment  of  the  IMP  was  done  in  this  manner.  Figure  E-l  shows  the  present 

* 

hardware  distribution. 

Implementation  of  hardware  t'or  direct  attachment  is  presently  underway  and 
none  has  been  operated  as  yet.  However,  fundamental  changes  in  the  software 
have  been  made  to  allow  the  addition  of  the  new  devices  by  direct  attachment, 
when  the  hardware  is  completed. 

The  last  report  discussed  the  use  of  the  UCS3  Buffer  as  a  "test-bed"  for 
new  devices  while  maintaining  its  present  operation.  In  tnis  way  new  devices 
would  be  attached  to  the  existing  3uffer  for  test  until  the  direct  attachment 
facilities  exist.  Two  such  attachments  are  underway.  The  first  is  the  Multi- 
Teletype  controller  that  will  operate  Teletypes  located  arouna  the  campus,  the 
second  is  the  modification  of  an  existing  segment  on  the  Buffer  to  allow  half¬ 
duplex  operation  with  an  accoustic  coupler  unit.  Both  of  these  devices  will 
subsequently  be  attached  to  future  hardware  for  direct  program  operation  by 
the  360. 

The  following  items  summarize  the  position  of  the  several  projects  underway 
(l)  The  Multi- Teletype  Control  prototype  unit  has  been  tested  on  the 

*  .  — 

This  Figure  was  included  in  the  last  technical  report  and  is  presented  again 
for  reference. 


existing  Buffer.  Fabrication  has  been  completed  on  the  final  unit  and  tests 
will  proceed  using  the  UCSB  3uffer  until  direct  attachment  is  achieved. 

(2)  The  High-Speed  bulk  data  buffer  is  presently  in  check-out.  A 
special  direct  attachment  will  be  implemented  for  use  on  the  System/360. 

(3)  The  direct  attachment  facility  will  be  gained  through  use  of  a 
Multi-Line  Controller  which  is  presently  in  the  design  phase.  When  completed, 
the  Multi -Teletype  Control,  additional  display  consoles,  a  program  setable 
time  interval  controller,  remote  computers,  and  remote  job  entry  units  will 
be  attached. 
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C.  Speech  Project,  Cer.eral 

The  progress  of  the  speech  project  is  discussed  in  the  following  sections 
under  the  headings  theory,  software,  and  hardware  respectively. 

Speech  Project,  Theory 

The  theoretical  aspects  of  the  speech  program  were  concerned  with  the 
following  areas  during  this  period: 

(1)  Examination  of  the  wave-function  structure  of  the  phonemes  of  the 
English  language  to  define  the  requirements  of  the  wave-function  model  of  human 
speech . 

(2)  Definition  of  a  preprocessing  method  to  filter  the  raw  speech  string 
into  sub-strings  amenable  to  wave-function  analysis. 

(3)  Development  of  a  one-pass  wave-function  analysis /synthesis  system 
based  on  ttie  Gaussian  Cosine  Modulation  (GCM)  Model,  that  will  accurately  analyze 
and  synthesize  both  male  and  female  speech  data. 

The  analysis  programs  have  been  structured  to  provide  parameters  that  are 
compatible  with  the  work  being  done  on  speech  recognition  (for  example  precision 
frequency  information). 

(1)  Continued  studies  on  the  computer  classification  and  recognition  of 
phonetic  information  including  extraction  of  recognition  parameters  from  the 
ASCy'A  parameter  set,  recognition  of  steady-state  vowels  and  vowels  embedded 
between  two  unvoiced  phonemes  for  a  single  speaker  and  preliminary  studies  of 
the  segmentation  of  connected  phonemes. 

(5)  Studies  of  the  data  rate  of  the  basic  ASCj6w  representation  and  the 
amount  of  data  compression  possible  through  the  elimination  of  redundant  wave- 


function  sets. 
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(6)  Preliminary  definition  of  the  interrelationship  between  the  wave- 
function  representation  and  a  classical  formant  model  of  human  speech. 

Qnpirical  formulae  have  been  developed  relating  the  ASC$N  parameters 
to  formant  amplitudes,  frequencies,  and  bandwidths. 

The  above  topics  are  discussed  in  detail  in  the  following  sections. 

(a )  Wave-Function  Structure  of  English  Phonemes 

The  success  of  an  analysis  system  based  on  the  Gaussian  wave- 
function  representation  depends  upon  the  accuracy  with  which  the  model  covers 
the  set  of  wave- functions  found  in  filtered  human  speech.  To  verify  the  complete¬ 
ness  of  the  model  the  wave-function  structure  of  each  of  the  34  basic  English 
phonemes,  for  a  male  and  female  voice,  was  studied.  Two  different  filtering 
methods  were  employed  for  preprocessing  the  raw  acoustic  data  into  sub-strings 
amenable  to  wave-function  analysis.  These  were  as  follows: 

1.  Adaptive  Filtering  -  Filtering  the  raw  speech  data  around  the 
formants  of  the  short-tem  energy  spectrum  as  described  in  the  previous  semi¬ 
annual  report. 

2.  Fixed  Filtering  -  Filtering  the  speech  data  into  four  contiguous 
frequency  bands  covering  the  frequency  range  100  -  3^00  Hz  in  which  the  filter 
characteristics  are  fixed  (identical)  for  all  phonemes.  Each  of  these  filtering 
methods  gave  equivalent  results.  They  both  showed  that  the  wave  function  model 
is  not  complete  in  the  sense  that  it  covers  all  wave  functions  found  in  filtered 
human  speech.  There  are  actually  two  separate  classes  of  wave-functions  of  which 
the  acoustic  waveform  may  be  composed. 

1.  Waveforms  with  a  Gaussian  Envelope  -  The  cyclic  behavior  of  the 
waveform  under  the  envelope  may  be  described  by  either  some  appropriate  Hermite 
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polynomial  (Gaussian  Wave-Function  Model)  or  an  appropriate  cosine  function 
(Gaussian  Cosine  Modulated  Model) 

2.  Sinusoids  -  Sinusoidal  waveforms  occuring  in  the  frequency 
region  defined  by  the  pitch-period.  Experimental  results  have  shown  that  the 
family  of  sinusoidal  wave- functions  only  ocour  in  that  frequency  region  defined 
by  the  pitch-period.  Higher  frequency  regions  contain  waveforms  with  Gaussian 
envelopes.  The  occurance  of  a  sinusoid  is  functionally  dependent  on  the  pitch- 
period  of  the  voice.  As  the  male  voice  pitch-period  shortens,  the  wave-function 
structure  changes  from  a  Gaussian  envelope  character  to  a  sinusoidal  character. 
This  is  illustrated  in  Figure  C-la  and  b  for  a  male  speaker  uttering  the  vowel 
/ i/  as  in  "eve"  at  pitch-periods  of  7.8  msec  and  U.l  msec  respectively.  As 
the  pitch-period  shortens,  the  wave-function  structure  changes  from  a  Gaussian 
to  a  sinusoidal  characteristic.  Two  additional  examples  illustrate  this  effect. 
Since  the  female  voice  typically  has  a  short  pitch-period  relative  to  the  male 
voice,  it  would  be  expected  from  the  above  that  the  female  voice  would  have  a 
strong  sinusoidal  component  for  the  voiced  sounds.  Figure  C-2  shows  a  comparison 
between  a  normal  male  and  female  voice  uttering  the  word  "put".  The  male  wave- 
function  structure  (Figure  C-2a)  has  a  consistant  Gaussian  characteristic  for 
both  the  plosive  and  vowel  sounds  whereas  the  female  voice  exhibits  a  sinusoidal 
characteristic  during  the  vowel  segment.  Figure  3  compares  a  normal  male  and 
female  voice  uttering  the  word  "mat".  The  male  voice  (Figure  3&)  exhibits  a 
consistent  Gaussian  characteristic  whereas  the  female  voice  (Figure  3b)  shows 
a  sinusoidal  characteristic  for  the  voiced  nasal  consonant  and  vowel,  and  then 
becomes  Gaussian  during  the  plosive. 

The  above  examples  show  that  a  general  wave-function  analysis  system  must 
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be  able  to  accurately  -epresent  wave- functions  with  Gaussian  envelope  char¬ 
acteristics  and  sinusoidal  characteristics. 

A  method  has  been  developed  as  part  of  the  new  analysis  system  that  builds 
sinusoids  out  of  GCM  waveforms.  This  permits  a  consistent  set  of  parameters  to 
be  generated  by  the  new  wave- function  analyzer, 

(b)  Preprocessing  of  Acoustic  Waveform 
In  the  previous  semi-annual  report,  a  method  was  presented  for  filtering 
the  input  speech  string  into  sub-strings  that  are  amenable  to  wave-function 
analysis.  This  technique  required  the  tracking  of  the  major  energy  peaks  in 
the  short-term  energy  spectrum  of  the  acoustic  waveform  (i.e.  formant  tracking 
for  vowels)  and  adjusting  the  center  frequency  and  bandwidths  of  these  filters 
in  covering  the  frequency  range  from  300  -  3200  Hz.  It  was  demonstrated  that 
this  approach  would  correctly  filter  the  original  speech  string. 

This  method  of  preprocessing  the  input  speech  data  was  employed  successfully 
in  both  the  analysis/synthesis  and  recognition  studies.  However,  it  became 
increasingly  evident  that  the  digital  simulation  of  an  automatic  tracking  filter 
was  a  complicated  process  and  required  the  major  portion  of  computer  time  in 
both  the  wave- function  analysis/synthesis  and  recognition  studies.  In  view  of 
this  further  studies  of  the  preprocessing  problem  were  undertaken  with  the 
intention  of  defining  a  set  of  fixed  frequency  ranges  which  would  correctly 
filter  the  acoustic  waveform  into  acceptable  sub-strings  for  both  the  male  and 
female  voice.  This  study  has  been  successful  in  defining  the  four  contiguous 
frequency  bands 


100 

-  Uoo  Hz 

Band  1 

too 

-  900  Hz 

Band  2 

l4. 


900  -  1800  Hz  Band  3 

1800  -  3600  Hz  Band  h 

which  correctly  partition  the  original  speech  string  into  four  sub-strings 
whose  waveform  structure  is  of  a  form  suitable  for  Gaussian  wave-function 
analysis. 

An  experimental  approach  was  taken  in  the  determination  of  the  fixed 
filter  bandwidths  and  center  frequencies.  Initially  the  following  criteria 
were  established  as  the  basis  for  the  selection  of  the  filter  specifications: 

1.  Filter  parameters  must  be  selected  so  that  the  resulting 
sub-strings  have  an  appropriate  wave-function  structure  to  fit  the  wave- function 
analysis  model. 

2.  The  number  of  fixed  filter  bands  must  be  kept  to  a  minimum 
to  avoid  additional  complexity  in  the  analysis  scheme. 

3-  The  fixed  filter  parameters  must  be  chosen  such  that  they 
contain  relevant  information  for  recognition. 

The  experimental  endeavor  involved  generation  of  the  short  term  energy  spectrum 
of  each  of  the  3*+  English  phonemes  for  a  male  and  female  speaker.  The  first  three 
major  energy  peaks  for  each  phoneme  were  then  plotted  as  a  function  of  frequency. 
Examination  of  the  resultant  plots  indicated  that  the  energy  peaks  roughly  occurred 
in  three  distinct  frequency  regions;  below  1000Hz.,  1000  -  2000  Hz.,  above  2000  Hz. 
These  three  fixed  regions  gave  adequate  filtering  results  even  though  two  major 
energy  peaks  would  be  grouped  together  as  for  example  for  the  vowel  /a/.  Furtner 
studies  showed  that  breaking  the  region  below  1000  Hz  into  two  bands  at  1+00  Hz 
and  setting  the  upper  frequency  limit  above  2000  Hz  to  36OO  Hz  gave  consistently 
good  results.  A  frequency  of  100  Hz  was  established  as  the  lower  frequency  limit 
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to  minimize  random  low  amplitude  noise  occuring  in  the  region  below  100  Hz. 

The  upper  limit  on  3and  1  of  400  H2  was  selected  to  ccrrespona  to  a  minimum 
pitch-period  of  2.5  milliseconds.  For  the  spectrum  of  pitch-periods  encountered 
in  the  male  and  female  voice  this  represents  a  reasonable  choice.  Additional 
experimental  studies  demonstrated  that  setting  the  upper  limit  on  Band  2  at 
900  Hz  improved  the  results  even  more  since  this  stparated  the  second  major 
energy  peak  from  the  first  for  several  of  the  phonemes. 

In  the  determination  of  the  appropriate  fixed  filter  bands  it  was  useful 
to  repeat  the  Peterson- Barney  vowel  map^  into  the  line  form  depicted  in 
Figure  C-4.  This  vertical  line  plot  represents  the  possible  frequency  ranges 
of  the  first  two  formants  F^  and  Fg  (first  two  energy  peaks)  of  the  twelve 

vowels  for  a  mix  of  76  male,  female,  and  child  speakers. 

The  horizontal  lines  on  Figure  C-4  define  the  bandwidths  of  the  four  fixed 
filters.  The  figure  illustrates  that  the  frequency  ranges  defined  by  the  four 
fixed  filter  bands  do  contain  useful  frequency  information.  From  a  consideration 
of  the  figure  it  can  be  seen  that  five  vowels,  (/i/,  / l/ ,  /e/,  /£/ ,  /u/)  have 

the  possibility  of  the  first  formant  occuring  within  Band  2,  and  in  some  cases 

the  second  formant  may  also  occur  within  Band  2.  Eight  of  the  vowels  can  have 
a  first  or  second  formant  occuring  within  Band  3>  and  seven  vowels  may  have  the 
second  formant  falling  within  Band  4,  The  point  to  note  is  that  due  to  the 
positions  of  the  formants  within  these  fixed  bands,  some  recognition  information 
is  available  by  a  simple  examination  of  the  sub-string.  For  example,  due  to  the 
lack  of  a  formant  in  Band  3>  the  vowels  /i/,  /if,  /£/,  and  /e/  will  have  very  low 

^ '’Feterson,  Gordon  E.  and  Barney,  Harold  L.,  "Control  Methods  Used  in  a  Study 
of  the  Vowels",  J.  Acoustical  Society  Am.,  Vol.  24,  pp.  175  -  184,  March  1952. 
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amplitude  in  this  band  as  compared  to  the  other  three  bands.  This  is  clearly 
illustrated  in  Figure  C-5  which  shows  the  output  of  the  fixed  filter  pre¬ 
processor  for  the  word  "steek".  Figure  C-5a  shows  the  word  prior  to  filtering 
and  Figure  C-5b  illustrates  the  four  filtered  sub-strings.  In  the  900  - 
1800  Hz  sub- string  it  can  be  seen  that  during  the  vowel  portion  (/ i/ )  the 
amplitude  is  indeed  almost  insignificant  as  compared  to  the  other  sub-strings. 
Also  note  that  the  largest  amplitudes  occur  in  Bands  1  and  4,  because  these 
are  the  bands  in  which  the  formants  occur. 

Consider  the  usefulness  of  these  four  frequency  bands  with  respect  to 
phonemes  other  than  vowels.  Figure  C-6  shows  the  word  "shop"  and  its  four 
sub-strings.  Mote  that  the  fricative  phoneme  /sh/  stands  out  clearly  in  the 
1800  -  38OO  Hz  sub-string,  the  vowel  portion  stands  out  as  a  repetitive 
structure  in  all  four  sub-strings,  and  the  /p /  phoneme  is  indicated  primarily 
as  a  burst  of  low  frequency  wave- functions  in  the  100  -  400  h'z,  sub-string. 

The  combination  of  bana.c  1  and  4  can  serve  as  strong  indicators  of  voiced  vs. 
unvoiced  phonemes . 

Another  example  is  the  word  "men"  as  shown  in  Figure  C-?'.  This  example 
demonstrates  how  a  nasal  phoneme,  with  its  voiced  repetitive- like  structure, 
can  be  distinguished  from  a  vowel.  Most  of  the  power  of  a  nasal  resides  in 
the  100  -  400  Hz.  band  while  the  vowel  has  significant  energy  in  at  least 
three  bands. 

The  fixed  filter  ranges,  although  experimentally  determined,  do  exhibit 
the  common  property  that  there  is  approximately  an  octave  change  across  the 
filter  bandwidth.  For  example  the  lower  frequency  of  Band  3  is  900  Hz  while 
the  upper  frequency  is  l800  Hz.,  an  octave  difference.  Note  that  Band  1  is 


ICO  -  1*00  Hz 

400  -  900  Hz 

900  -  1800  Hz 

1800  -  3600  Hz 

All  sin  x  filter 
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2  C-5  (a)  The  word  "steek"3  unfiltered 

(b)  The  4  sub-sirings  of  the  word  "steek" 
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the  one  exception  to  this  observation. 

Figure  C-8  compares  the  original  and  synthetic  versions  of  the  word 
"max"  as  recorded  for  a  male  speaker  for  the  adaptive  and  fixed  filtering 
approaches.  As  indicated  the  synthesized  version  of  "max"  using  adaptive 
filtering  (Figure  C-8b)  and  that  using  fixed  filtering  (Figure  C-8c)  compare 
favorably  with  the  original  word. 

The  purpose  of  the  preprocessor  is  to  transform  the  input  acoustic 
waveform  denoted  as  a  "string"  into  "sub-strings"  that  are  amenable  to 
wave-function  analysis.  It  has  been  demonstrated  above  that  a  high-quality 
vave-function  representation  can  be  obtained  by  filtering  the  input  string 
into  four  sub-strings  covering  the  frequency  range  from  (100,  3600)  Hz.  Let 
s(t)  be  the  original  string  with  frequency  components  in  the  range  (100,3600)Hz. 
Then  ^ 

s(t)=^LSn(t)  (C-l) 

where  s  (t)  is  che  nth  sub-string.  In  the  frequency  domain 
n  4 

S(jv)  =2^  Sn(jw) 

Ihe  frequency  regions  R^  corresponding  to  Sn(jw),  n  =  1,4  are  divided  in 
the  following  manner: 

100  <  <  400  Hz 

400  <  Pg  <  900  Hz 
900  <  Rj  <  1700  Hz 
1700  <  R^  <  3600  Hz 

This  separation  into  four  contiguous  frequency  regions  corresponds  to 
convolution  of  sin  x/x  type  bandpass  filters  with  s(t)  to  obtain  the  sub-strings. 
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By  applying  the  discrete  convolution  equation  each  of  the  four  sub-strings 

for  the  system  simulation  is  obtained  as 

62 

sn(k)=l  s(k)  h  (j-k)  (C-2) 

n  j=-62  n 

where  «n(k)  =  sn(^)lt=kT  k  =  0,1,2... 

sin(2rtB  (J-k)) 

‘  2Bn  -  “■  [2"VJ-k)] 

th 

and  T  is  the  discrete  sampling  period.  The  parameters  of  the  n  convolution 
kernel  hR(t)  are  defined  from  the  n  region.  For  example  B2  *  900  -  U00  =  500  Hz 
and  Fp  a  (400  +  900)/2  *  65O  Hz.  Equation  (C-2)  is  utilized  for  simulating 
the  fixed-filter  preprocessor  on  the  UCSB  1800  speech  system. 

(c)  Improved  Wave-Function  Analysis/Synthesis  System 

The  process  for  wave-function  analysis  of  human  speech  that  has 
evolved  at  UCSB  is  a  three-step  operation. 

1.  Record  a  sample  of  speech  of  time  length  T 

2.  Preprocess  (filter)  the  speech  sample  into  four  sub-strings 
each  of  duration  T 

3.  Analyze  each  sub-string  into  its  set  of  ASCJ^N  parameters. 

Previous  work  at  UCSB  has  used  the  Gaussian  Wave-Function  family  as  a 

model  for  the  individual  wave- functions  in  the  filtered  sub-strings  of  the 

raw  speech  string.  The  family  of  Gaussian  wave- functions  is  the  set  of 

/2  th 

derivatives  of  the  Gaussian  function  e  .  The  n  function  is  explicitely 

described  by  the  Gaussian  function  multiplied  by  a  Hermite  polynomial  of  degree 

n.  The  faruly  of  functions  satisfy  the  differential  equation 

2  2 

0  =  U(t)  *  (£)  (t-C)U(t)  *  (2.)  (K2  -  i)U(t) 


(c-3) 


with  initial  conditions 
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U(C)  =>  A  cos  <f 
U(C)  =  A  ^  sin  j) 


N  is  a  physically  descriptive  parameter  defining  the  number  of  half  cycles 
of  the  Hermite  Polynomial  which  occur  under  the  envelope  of  a  particular  wave- 
function.  The  relationship  defining  N  is 


where  n  =  order  of  the  Hermite  Polynomial.  There  is  no  closed  form  solution 
which  describes  this  family  of  wave- functions .  It  has  been  necessary  to 
implement  a  recursive  solution  to  Equation  (C-3)  in  order  to  generate  any 
arbitrary  wave-function  of  this  family.  This  complicates  the  problem  of 
analysis  and  synthesis. 

As  reported  previously,  an  asymtoptic  solution  to  Equation  (C-3)  has 
been  obtained  which  is  of  a  closed  form  and  iB  also  valid  for  the  range  of 
wave-functions  encountered  in  human  speech.  This  solution  defines  the 
Gaussian  Cosine  Modulated (GCM)  family  of  wave- functions.  Any  arbitrary  member 


of  this  family  of  wave-functions  can  be  described  by 

-(J  (t-c))2 

u(t)  =  Ae  ®  cos  (u>  (t-c)  -  <f>) 


(c-4) 


id 

o 
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The  solution  is  thus  a  Gaussian  envelope  of  amplitude  A,  center  in  time  of  C, 
and  spread  in  time  of  S,  multiplied  by  a  cosine  wave  of  frequency  Fq  and 
phase  ij)  with  respect  to  C. 

This  model  still  has  only  five  parameters  describing  the  entire  wave- 
function  but  now  the  set  of  ASC0N  parameters  is  replaced  by  the  set  of  AFC0F 


parameters.  A  representative  wave- function  is  show;-,  in  Figure  C-9. 

The  parameters  of  the  GCM  model  as  in  the  Gaussian  Wave- Function  Model 
are  chosen  to  be  physically  meaningful  in  describing  the  given  wave- function. 

A  =  Amplitude  of  envelope  of  wave- function 

S  =  Spread  of  wave-function  envelope  or  that  time  interval  during 
which  99  $  of  the  energy  of  the  wave- function  occurs. 

C  =  Center  of  the  envelope  in  time 

=  Fhase  of  the  cosine  wave  with  respect  to  C 

F  -  Frequency  of  the  cosine  wave 

From  a  computational  and  conceptual  viewpoint  the  GCM  model  is  much 
simpler  to  operate  with.  Therefore  a  new  analysis  and  synthesis  has  been 
developed  based  upon  this  model. 

GCM  Wave-Function  Analysis 

Previous  work  in  developing  the  wave-function  analysis  process  required 
a  multi-pass  analysis  on  a  given  sub-string  of  data.  This  process  is  undesir¬ 
able  because  it  generates  multiple  sets  of  wave-function  parameters  for  each 
sub-string  and  multiplies  the  amount  of  time  required  to  analyze  a  given  sub¬ 
string  by  the  number  of  passes  on  the  sub- string.  Investigation  showed  that 
multi-pass  analysis  was  required  due  to  three  problem  areas. 

1)  Improper  preprocessing  (filtering)  of  the  raw  speech  string 

2)  Inaccuracies  in  the  analysis  process 

a)  Sampled  data  inaccuracies 

b)  Failure  to  set  practical  limits  on  calculated  parameter  values 

3)  Inability  to  handle  sinusoids,  particularly  on  the  female  voice. 

From  an  ideal  standpoint,  a  one  pass  analysis  system  is  desirable  in  order 
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to  achieve  maximum  speed  in  the  analysis  process  and  a  minimum  number  of 
parameters  to  describe  the  speech  sub-string. 

In  order  to  achieve  this  goal,  a  new  analysis  system,  based  upon  the 
COM  model,  has  been  developed  which  accurately  performs  a  one  pass  analysis 
on  any  arbitrary  speech  input  from  a  male  or  female  voice.  In  conjunction  with 
this,  a  GCM  based  wave- function  speech  synthesis  system  has  also  been  completed. 

Each  of  the  problem  areas  that  necessitated  multi-pass  analysis  has  been 

investigated  and  the  appropriate  solution  has  been  implemented  in  the  new 

system.  Problem  area  one,  improper  preprocessing,  has  been  solved  by  the 

definition  of  the  four  correct  fixed  filter  bands  previously  discussed  and 

s jin  x 

then  constructing  the  appropriate  — - —  band  pass  kernels  to  use  in  the 
existing  digital  filter  convolution  programs.  Problem  area  two,  inaccurate 
analysis,  has  been  solved  by  performing  an  error  analysis  on  the  wave- function 
process  to  define  the  significant  analysis  errors  that  needed  correction. 

These  were: 

1)  Improper  estimation  of  extrema  and  times  of  occurence  due  to  inac¬ 
curacies  in  the  sampled  data.  This  factor  introduces  a  significant  error, 
that  is  a  function  of  frequency,  into  all  five  parameters.  The  error  is 
minimized  by  implementing  a  parabolic  curve  fitting  operation  to  the  sampled 
data  during  the  extrema  detection  operation. 

2)  Failure  to  set  bounds  on  calculated  parameters.  The  sampled  speech 
data  varies  at  times  significantly  enough  from  the  wave- function  model,  so 
that,  unless  the  S  parameter  is  bounded,  irrecoverable  errors  are  introduced 
in  the  Residue  calculation  process.  To  avoid  this,  when  analyzing  a  given 
wave-function,  the  speech  string  is  extrapolated  into  the  future  to  locate  the 
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next  wave-function.  S  is  then  bounded  so  the  present  wave-function  does  not 
couple  beyond  the  center  C  of  the  future  wave-function.  The  character  of  the 
future  wave-function  is  therefore  not  destroyed  by  an  error  in  the  calculation 
of  the  parameters  for  the  present  wave- function.  Errors  in  the  other  four 
parameters  were  found  to  be  generally  small  enough  to  avoid  the  necessity  of 
bounding  them. 

The  third  problem  area,  inability  to  handle  sinusoidal  wave- functions, 
was  solved  by 

1)  Setting  filter  Band  1  to  an  upper  limit  of  400  Hz.  This  limits  the 
sinusoidal  component  to  Band  1. 

2)  Construction  an  algorithm  in  the  analysis  process  that  detects  the 
presence  of  a  sinusoid  and  then  generates  a  set  of  ASC6f  parameters  from  which 
the  sinusoid  can  be  built.  The  algorithm  is  specifically  tailored  to  generate 
one  wave-function  per  pitch  period,  even  during  a  sinusoid,  so  that  the  pitch 
information  contained  in  the  C  parameter  is  retained  for  the  recognition  system. 

As  in  previous  work,  the  new  wave-function  analysis  system  is  based  upon 
a  four  point  analysis  of  a  given  wave-function  that  uses  the  four  extrema 
grouped  around  C  to  calculate  the  five  parameters  to  define  the  wave- function. 
This  requires  an  extrema  detection  process  which  maps  the  sampled  data  format 
into  an  extrema  (peak  vs.  time)  format.  Therefore  the  first  step  prior  to 
the  analysis  process  on  the  filtered  sub- string  is  to  convert  the  sampled  data 
sub-string  into  a  list  of  extrema.  Define 

S(t)  =  Acoustic  waveform  speech  string 

Sd(t)  =  Sampled  acoustic  waveform  speech  string 

Sdn(t)  ■  Sampled  filtered  sub-string  n  =  1,2, 3, 4 
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=  Extrema  listing  of  speech  string  n  =  1,2, 3,4 
A  block  diagram  of  the  process  prior  to  analysis  would  then  be  as  shown 
in  Figure  C-10. 

The  analysis  process  then  consists  of  scanning  the  extrema  list  to 

Kn 

isolate  the  wave- functions  in  the  sub- string  and  then  to  calculate  the  five 
ASC^F  parameters  which  define  each  isolated  wave-function.  Since  four  points 
are  used  to  characterize  a  wave-function,  the  extrema  list  is  scanned  using 
four  points  at  a  time  until  a  stopping  condition  for  a  wave-function  occurs. 
Determining  the  ASC0F  parameters  is  therefore  a  two  step  process:  1)  Satisfy 
stopping  criteria  to  isolate  a  wave- function  2)  Calculate  ASC0F  parameters. 

Once  the  parameters  of  a  given  wave-function  have  been  determined,  the  effect 
of  the  wave-function  coupling  into  the  future  must  be  removed  to  be  able  to 
correctly  determine  the  parameters  of  the  next  wave-function.  This  is  accom¬ 
plished  by  building  the  calculated  wave- function  and  then  subtracting  out  its 
effect  from  the  extrema  listing.  This  process  is  the  Residue  calculation. 

A  block  diagram  of  the  entire  analysis  process  is  shown  in  Figure  C-ll. 
Extrema  Detection  and  Correction 

The  sub- string  to  be  analyzed  is  in  a  sampled  data  form.  This  is  converted 
to  a  sign  magnitude  (peak)  vs.  time  format  by  the  extremum  detection  and 
correction  operation. 

Let  th 

x .  =  j  sample  data  point 
J 

j  =  index  of  sample  data  list  j  =  1,...,7440 

The  operation  is  defined  in  the  flow  chart  of  Figure  C-12.  The  analysis  system 
used  depends  upon  accurate  extrema  parameter  values.  Since  sampled  data  is 
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Figure  C-10 


31ock  diagram  of  preprocessing  and  extrena  conversion  processes. 


Figure  C-ll  Block  diagram  of  wave- function  analysis  process 
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Figure  C-12 


Extrema  detection  and  correction  flow-chart 
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only  an  approximation  to  the  true  extrema,  a  parabolic  curve  fitting  is  done 
to  the  sampled  data  to  precisely  define  the  extrema.  This  is  accomplished  by 


the  following  relations. 


Sampling  Frequency  =  17.5  kHz 
TS  =  1/17.5 


th 


T*(j"l)XTS*  Time  in  msec,  of  j  sample 

A  ■  <Vi  -  *  xd-i)/(2  ts2> 

B  "  (xj+l  '  xj_i)/<2  TS) 

2 

^CORR  *  "B  =  Correction  to  peak  value 

TCORR  =  ”b/(2A)  =  Correction  to  time  value 


(C-5) 


Stopping  Criterion  (Normal) 

When  scanning  the  extrema  data  list  <u  ,  a  criterion  is  established  to  isolate 
a  wave- function  behavior 


K  =  Extrema  data  list  index 
u)  =  Extrema  data  list 
K  =  1  =  Initial  condition 
u)^  =  Sign  magnitude  of  extrema 

=  Time  of  occurence  of  extrema 

Figure  C-13  illustrates  the  character  of  a  wave-function  in  extrema  format. 


The  following  quantities  can  be  defined  from  Figure  C-13  for  a  wave- function 
in  extrema  format. 


K  =  present  index  of  extrema  data  list 

w  ■  Peak  1 
It 


(V 

w 

U) 

U) 

m 


K+l 

K+2 

K+3 

K+U 

K+5 


U) 

U) 


k+6 

K+7 


=  Time  of  peak  1 
*  Peak  2 

®  Time  of  peak  2 
=  Peak  3 
*=  Time  of  peak  J 
=  Peak  4 

b  Time  of  peak  4 


The  stopping  criterion  is 


STOP  if 


2 

and 

KJ  2  K' 

otherwise  increment  the  index  by  2  to  K  +  2  and  check  for  the  stopping  criterion 
again. 


If  the  stopping  criterion  is  satisfied,  this  means  that  A  and  C  will  occur 
during  the  time  interval  defined  by  (<u^+2>  stopping  criterion  is 

equivalent  to  the  presence  of  a  local  maximum  or  minimum  in  the  extrema  data 


list. 


Once  the  normal  stopping  criterion  has  been  satisfied,  the  ASClZJF  parameters 
of  the  isolated  wave-function  are  then  calculated.  This  process  is  accomplished 
by  making  calculations  based  on  the  known  geometry  of  the  wave-function  family. 
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The  four  extrema  define  four  points  in  the  envelope  of  the  wave- function.  This 
i6  sufficient  information  to  calculate  the  S  and  C  parameters.  Once  S  and  C 
are  known,  A  and  0  can  be  calculated  from  either  of  the  extrema  adjacent  to  C. 
The  time  values  of  the  two  extrema  around  C  can  be  used  to  determine  the 
frequency  parameter.  The  ASC0F  parameters  are  thus  found  from  thfe  following 
expressions . 

Determination  of  F 

=  Time  value  of  extrema  to  right  of  C 

u)  =  Time  value  of  extrema  to  left  of  C 
K+3 


2^ 


K+5 


O)  ) 

K+3' 


(c-6) 


Choosing  the  two  time  values  at  the  center  of  the  wave- function  minimizes 
errors  in  the  frequency  calculation  due  to  coupling  from  adjacent  wave- functions . 
Calculation  of  S  £ 

S  *  ?  f  o  I  t'i  1  (c-7) 


where  these  variables  are  defined  in  Figure  C-13* 


(c-8) 


,1  +  K+41 


. . . . . . I . .  aiiiiii.i. 


36- 


Computation  of  C 


Calculation  of  A 


Calculation  of  $ 


c  .  u  ♦  (l  -  o>  v> 


where 


RHO 


IVyl  -  K1 

I  (JO  I  -  111 

I  V19I  *  V 


~K+2*  1  K+61 

U  =  [n/2  -  atan  (RHO)]  2/tt 


(c-9) 


(C-10) 


0  *  2nF  Ok+3  - 

c) 

"W  <  0 

t  -  ***  <v3  - 

c)  +  TT 

v,>0 

(c-11) 


Equations  (C-5)  through  (C-11)  represent  the  basic  wave-function  analysis 
method.  To  compensate  for  deviations  in  the  speech  data  from  the  basic  GCM 
model,  two  additions  are  required 

1)  Band  1  sinusoidal  analysis 

2)  Upper  limit  on  S 
Sinusoid  Analysis 

The  sinusoidal  component  taices  the  form  of  a  sinusoid  multiplied  by  the 
volume  emphasis  function  of  the  human  voice.  As  such  the  normal  stopping  criterion 
for  a  GCM  wave- function  will  not  detect  the  existence  of  a  sinusoid.  The  transition 
into  steady  state  portion,  and  transition  out  of  the  voiced  sinusoid  must  be 
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detected.  This  is  accomplished  by  again  forming  ratios  based  upon  the  four 
extrema  being  tested  in  the  extrema  list.  The  sinusoid  is  characterized  by 
all  four  of  the  adjacent  extrema,  b^ing  cf  approximately  the  same  magnitude. 
Therefore  the  sinusoid  stopping  criterion  is  as  follows  using  Band  1  information. 
Is  x  £  z 

Yes  FTA  =  ~ 

z 

No  ETA  =  z/x 

Is  ETA  <  .9 

Yes  Go  to  GCM  Stopping  Criterion 

No  Is  y  £  z 

Yes  ETA  *  y/z 

No  ETA  =  z/y 

Is  ETA  <  .9 

Yes  Go  to  GCM  Stopping  Criterion 

No  A  sinusoid  exists 

Tien  set 

Gamma  =  .3 

u  *  .5 

a  d  set  these  values  into  the  normal  set  of  equations  to  calculate  the  ASCjtfF 
p  rameters. 

This  will  create  a  GCM  wave- function  to  fill  a  pitch  interval(one  cycle  . 
of  the  sinusoid)  of  the  voiced  sound.  This  process  is  graphically  illustrated 
in  Figure  C-lU. 

Since  there  is  no  coupling  from  the  derived  GCM  wave-function  into  the 
future,  the  Residue  calculation  process  is  bypassed. 


RHO  =  1/F(4  extrema) 

U  -  [5  -  atan  (RHO)]  § 


k1 


6  =  f(V3,C,P’n) 


0  ■  F(«>k+3  ,  C,  F) 


RESIDUE 

CALCULATION 


K  ■  K  +  6 


I 


F  •  l/2(»Kt5  -  V3> 


S  *=  F(F,  GAMMA) 


m 

'S:S 


1 


C  =  f(U,^K+3,  u)K+5) 

A  =  F(S,C,u.k+2,  u)k+3) 


COMPUTATION 

OF 

ASCdF 

PARAMETERS 


Upper  Sound  on  S  Parameter 


Of  the  five  parameters,  errors  in  S  have  the  most  effect  on  introducing 

errors  in  the  analysis  process.  This  is  because  the  S  parameter  defines  the 

amount  of  coupling  between  adjacent  wave- functions.  There  are  significant 

enough  deviations  in  the  actual  speech  data  from  the  GCM  model  to  introduce 

unacceptable  errors  in  the  S  calculation  if  S  is  left  unbounded.  To  minimize 

error  coupling  between  adjacent  wave-functions  S  y  is  limited  so  that  the 

present  wave-function  does  not  couple  past  the  center  of  the  next  wave- 

function.  The  maximum  value  of  S,  S  ,  is  given  by  the  relation 

max 

lU '  2(ci*i '  V  (c-12) 

where  i  is  the  wave-function  index. 

S  can  be  approximated  by 
ID8X 


=w =  2t(aw>1+1  - 


(c-13) 


The  solution  to  Equation  (C-13)  is  accomplished  by  searching  the  extrema 
list  ahead  in  time  until  the  next  wave- function  is  isolated  by  the  normal  stopping 
criterion. 

Lower  Bound  on  Gamma 

Gamma  =  .3  corresponds  to  four  peaks  existing  under  a  Gaussian  envelope 

in  the  GCM  model.  Since  a  four  point  analysis  system  is  used  Gamma  .  =  .3* 

nan 

A  flow-chart  of  the  complete  wave-function  analysis  process  is  shown  in 
Figure  C-15. 

GCM  Wave-Function  Synthesis 


Assuming  that  the  ASC0F  parameters  have  been  correctly  sorted  into  the 
four  different  sub-string  sets,  the  synthesis  is  a  straight  forward  process. 


4l. 


where 

n(i,n)  =  {^A(i,n),  S(i,n),  C(i,n),  0(i,n),  F(i,n/J  (C-l6) 

as  the  wave- function  parameter  set.  Since  each  wave-furfction  dies  off  as  an 
exponential  squared,  only  those  located  nearest  to  the  corresponding  present 
time  t  need  to  be  evaluated  at  the  index  k. 

The  total  synthesis  of  the  estimated  string  s(t)  is  denoted  by  s(t)  and 
is  calculated  in  discrete  form  by  summing  the  sub-strings.  Thus 

4 

§(k)=^in(k)  (C-17) 


To  illustrate  the  accuracy  of  the  new,  one  pass,  GCM  wave- function  analysis/ 
synthesis  system,  representative  phonemes  and  multiphoneme  sounds  from  the  male 
and  female  voice  have  been  analyzed  and  synthesized.  Figure  C-17a  shows  a 
comparison  between  the  original  vs.  synthetic  waveforms  of  a  196  msec,  segment 
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of  Band  1  of  the  vowel  /£/  as  in  "met".  Figure  C-17b  is  a  more  detailed  view 
of  a  35  msec,  segment  of  the  same  vowel  with  the  synthetic  (dotted)  waveform 
plotted  over  the  original.  A  comparison  of  the  original  vs.  synthetic  wave¬ 
forms  of  a  47  msec,  segment  of  Band  2  of  the  same  vowel  is  depicted  in  Figure  ]8a. 
Figure  l8b  shows  a  detailed  35  msec,  comparison  of  the  same  vowel.  In  Figure 
19a  the  synthetic  and  original  waveforms  for  Band  2  of  a  119  msec,  segment  of 
the  vowel  /a/  as  in  "all"  are  illustrated  with  Figure  19b  showing  a  35  msec, 
detailed  comparison.  Figure  C-20  compares  the  synthetic  and  original  waveforms 
for  Band  3  of  the  same  vowel.  Note  that  here,  the  analyzer  fitted  in  only  one 
function  per  pitch  period.  Figure  C-°l  shows  Band  1  of  the  synthetic  and 
original  waveforms  of  the  fricative  consonant  /f/  as  in  "for".  A  comparison 
of  Band  4  of  the  synthetic  and  original  waveforms  of  the  fricative  consonant 
/rh/  as  in  "she"  is  illustrated  in  Figure  C-22.  Figure  C-23  shows  Band  4  of 
the  synthetic  and  original  waveforms  of  the  stop  consonant  ft/  as  in  "to". 

Looking  at  multi-phoneme  sounds  Figure  C-24  shows  Band  1  of  the  word  "me" 
uttered  by  a  female  speaker.  Both  the  nu.sal  consonant  /m/  and  the  vowel  /if 
were  sinusoidal  and  were  both  accurately  represented.  Figure  C-25  illustrates 
the  "ei"  part  of  the  word  "pfeifer"  uttered  by  a  male  speaker.  This  figure 
shows  that  even  the  coupling  between  vowels  is  accuratly  described  by  the 
analyzer.  Figures  C-26a  through  e  show  a  detailed  comparison  of  each  of  the 
four  bands  and  the  synthetic  versus  raw  speech  string  for  the  word  "pete" 
spoken  by  a  female.  Examining  Figure  C-26a  which  shows  the  four  synthetic  sub¬ 
strings  summed  together  to  form  the  synthetic  speech  string,  demonstrates  how 
accurately  the  synthetic  speech  duplicates  the  original. 

The  GCM  analysis/synthesis  system  described  in  this  section  will  be  utilized 
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Hand  1  of  fricative  consonant  /F,/  as  in  ''for";  original  and 
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Band  U  of  the  original  ana  synthetic  waveforms  of  the  stop 
consonant  /t /  as  in  "to". 
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Figure  C-26 


Ccc.parJ.son  of  4  Bands  and  synthetic  versus  original  speech  string  ol' 
vord  "pete";  fe.-sule  speaker. 


in  the  SEL-8lOb  speech  system  currently  under  implementation, 

d )  Computer  Classification  ana  recognition  of  Rionetic  Information 

The  research  pugram  in  the  classification  and  recognition  of  phonetic 
information  conducted  since  the  last  report  has  yielded  very  favorable 
indications  that  recognition  information  can  be  extracted  from  the  ASC/$N 
parameters  and  used  to  perform  reliable  recognition  of  connected  speech. 

The  results  which  have  been  obtained  which  lead  to  this  conclusion  are  as 
follows : 

1)  Extraction  of  valid  frequency  information  (formant  or  otherwise)  from 
wave- function  parameters. 

2)  The  ability  to  make  a  successful  vowel  map  by  plotting  formant  1 
versus  formant  2  for  steady- state  vowels  based  on  frequency  data  obtained 
as  described  in  (l)  above. 

3)  Det ermi nation  of  useful  fixed-filter  bands  which  can  be  used  for 
speech  recognition. 

4)  Recognition  of  steady- state  vowels  of  a  single  speaker  using  3 
fixed- filter  bands. 

5)  Recognition  of  vowels  embedded  between  two  unvoiced  phonemes  for 
a  single  speaker,  using  3  fixed-filter  bands. 

6)  Preliminary  study  of  the  segmentation  of  connected  phonemes  using 
4  fixed- filter  bands. 

These  results  are  discussed  in  greater  detail  in  the  following  sectiors. 
Extraction  of  Valid  Frequency  Information  from  Wave-Function  Parameters 

In  addition  to  the  normal  wave- function  parameters,  the  present  analyzer 
also  provides  frequency  information  in  the  form  of  a  parameter  U  given  by 


w  =  radian  frequency 


AT  =  57  microseconds  (sampling  interval  of 
A-to-D  conversion) 

From  this  relation  the  frequency  in  Hz.  can  be  calculated  as  a  function 
of  U  and  represents  the  frequency  of  the  center  of  the  wave-function 

f  =  tt^x  (C-l8) 

The  average  frequency  of  the  wave-function  can  also  be  calculated  from 
the  wave-function  parameters  N  and  S,  where 


In  working  with  a  wave- function  analyzer  one  of  the  most  important  factors 
in  obtaining  good  parameters  is  the  proper  filtering  of  the  raw  speech  data 
into  appropriate  frequency  bands  or  sub- strings.  An  example  of  such  a  sub¬ 
string  and  its  corresponding  ASC$N  and  frequency  parameters  is  shown  in 
Figure  C-28.  It  is  an  18  ms.  portion  of  the  1200  -  2100  Hz.  sub-string  of 
the  vowel  / a?/ ,  as  in  "at".  An  examination  of  the  frequency  data  shows  that 
the  frequency  from  one  wave- function  to  the  next  is  not  constant,  even  within 
one  pitch  period.  This  is  of  course  to  be  expected  because  of  the  bandwidth 
involved  and  the  frequency  components  which  make  up  the  voiced  sound. 

If  it  is  desired  to  determine  formant  frequencies  from  this  type  of  data 
then  the  original  speech  stream  must  be  filtered  into  two  or  three  sub- strings, 
where  the  bandwidth  of  each  sub-string  contains  no  more  than  one  formant 
frequency.  Assuming  the  voiced  sound  is  a  steady  state  vowel,  then  an  18  ms. 
section  like  that  of  Figure  C-28  is  representative  of  the  entire  vowel.  A 
close  estimation  of  the  formant  frequency  within  each  sub- string  can  now  be 


made  by  taking  a  weighted  average  of  the  frequencies  of  the  wave-funct.ions 
within  the  l8  msec,  interval.  Each  frequency  term  is  weighted,  by  the  amplitude 
of  the  wave-function  it  represents.  Thus,  those  wave- functions  vitn  high 
amplitudes  contribute  more  power  and  are  given  more  weight  in  the  frequency 
equation 


F  = 


A-f-  A  Vb  +  A-f-  +  •••  +  " 


a  a 


c  c 


(c-19) 


A.  +  +  A_  +  ...  +  A 


a  b  c 

Using  the  amplitude  and  frequency  data  of  Figure  C-28,  formant  2  for  that 
particular  vowel  was  calculated  according  to  Equation  (C-l8)  to  be  1835  Hz. 
Figure  (C-29)  is  a  magnitude  vs.  frequency  plot  for  the  same  vowel  /ae', 
unfiltered,  during  the  same  time  interval  as  Figure  C-28.  The  approximate 
values  of  formant  1  and  formant  2  are  noted  on  this  plot.  It  should  also 
h?  noted  that  the  values  of  formant  2  from  the  plot  and  from  Equation  (C-19) 
are  within  approximately  50  Hz.  of  each  other.  This  kind  of  close  relationship 
hr s  been  demonstrated  for  the  first  two  formants  of  all  the  vowels. 

Successful  Mapping  of  Formant  Frequencies  Derived  from  Wave-Function 
Parameters  for  a  Single  Speaker 

The  key  to  valid  formant  frequency  calculation  from  wave-function  parameters 
is  the  proper  filtering  around  each  formant.  For  connected  speech  this  would 
imply  some  sort  of  automatic  formant  tracking  filter.  Since  no  such  tracking 
filter  exists  it  was  necessary  to  simulate  one.  The  simulation  procedure 
involved  taking  a  fourier  transform  of  a  representative  sample  of  a  steady- 
state  vowel.  Figure  C-30e.  shows  an  l8  nsec,  portion  of  the  steady-state  vowel 
/u/,  as  in  '"boot".  Its  corresponding  Fourier  transform  plot  is  shown  in 
Figure  C-30b.  From  (b)  the  frequency  cutoffs  around  the  first  two  formants 
were  chosen  as  lU4  -  720  Hz.  and  720  -  1700  Hz.  Figure  C-31  depicts  the  data 
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VOWEL  "A  6"  (AT),  MID  RAHDF  1200  -  2100  He. 
•'JleS' 


T IMF!  WINDOW  *  218.38MS.  TO  237.12MS. 


A 

0.03357 

0.02056 

0.01696 

0.04852 

0.03640 

0.02517 

0.01696 

0.04855 

0.03640 


S (MS .  ) 

r(Ms. ) 

f(nro. ) 

fl 

FO'Z. 

2.44 

219.16 

-307.9 

4.57 

1871.2 

3.25 

220.64 

-258.2 

6.09 

1871.2 

2.16 

224.35 

-180.0 

4.12 

1909.4 

3.43 

226.06 

-306.6 

6.09 

1776.5 

2.10 

227.71 

-325.9 

3.87 

1846.5 

2.73 

228. 96 

-126. 6 

5.12 

1871.2 

2.22 

232,84 

-180.0 

4.12 

1858,8 

3.45 

234.61 

-307.3 

6.09 

1765.2 

2.63 

236.26 

-352.7 

4.92 

1871.2 

Figure  C-28  l8  msec.  portion  of  the  vowel  W  as  in  "at"  and  its 
corresponding  parameters . 


■; r/ure  C-30  (a)  18  msec,  portion  of  the  steady-state  vovel  u as  in  ,fboot 

(b)  Fourier  transform  plot  of  (a) 


NOT  REPRODUCIBLE 


Figure  C-31 


18  msec,  portion  of  the  steady-state  vowel  /u/  as  in  "boot" 
filtered  l^U  -  720  Hz. 


Figure  C-32 


18  msec,  portion  of  the  steai.;  state  vowel  '\J'  as  in  ''boot" 
filtered  720  -  1700  Hz. 
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in  the  720  -  1700  Hz.  band.  In  both  bands  the  data  looks  good  from  the 
standpoint  of  having  a  nice  wave-function  structure. 

The  filtering  operation  was  performed  using  a  convolution  program  in 
the  IBM  1800  computer,  resulting  in  an  approximation  to  an  ideal  filter. 

The  filtered  sub- strings  were  then  analyzed  and  formant  frequencies  cal¬ 
culated  from  the  wave- function  parameters. 

As  mentioned  the  first  two  formants  for  10  steady-state  vowels  were 
calculated  according  to  Equation  (C-19)  and  those  results  were  plotted 
as  shown  in  Figure  33*  Each  vowel  was  spoken  at  least  3  times  by  a  single 
speaker,  and  at  different  pitches.  This  plot  indicates  the  "vowel  loop" 
for  that  speaker  and  shows  reasonable  separation  between  the  vowels. 

Verification  of  the  results  obtained  from  the  procedure  just  described 
was  accomplished  by  two  means,  l)  The  frequencies  calculated  from  the 
parameter  data  were  compared  with  the  locations  of  the  formant  peaks  on 
the  Fourier  transform  plot.  2)  The  overall  results  of  Figure  C-33  were 
compared  with  those  of  the  Peterson  and  Barney  plot  (l)  which  is  a  plot 
of  formant  1  vs.  formant  2  for  76  speakers  -  men,  women,  and  children. 

While  the  formant  map  of  Figure  C-33  looks  as  though  it  could  provide  a 
good  foundation  for  vowel  recognition,  there  would  still  remain  the  problem 
of  filtering.  As  was  mentioned  in  the  discussion  of  the  preprocessor  the 
simulation  of  an  automatic  tracking  filter  required  a  large  amount  of  computer 
time  in  the  recognition  studies.  As  a  result  a  set  of  fixed-frequency  para¬ 
meters  were  defined  and  are  now  being  used  in  the  recognition  process. 
Fixed-Filtering  as  Related  to  Speech  Recognition 


As  indicated  in  the  discussion  of  the  preprocessor  the  fixed  filter 
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parameters  which  were  selected  were  chosen  so  that  they  contained  relevant 
information  for  recognition.  The  wave-function  parameters  resulting  from  an 
analysis  of  the  fixed-filter  sub-strings  for  the  word  "steek"  (Figure  C-5) 
are  shown  in  Tables  C-l  through  C-4.  Table  4  is  incomplete  because  of  the 
large  number  of  parameters  involved.  Besides  the  five  wave-function  parameters, 
each  Table  contains  the  average  frequency  of  each  wave-function,  the  difference 
between  the  successive  principle  C  parameters  (labeled  DELTAC),  and  the 
difference  between  successive  principle  frequency  terms  (labeled  DELTAF). 

The  principle  frequency  and  principle  C  just  mentioned  are  those  associated 
with  what  is  called  the  principle  wave- function.  The  definition  of  a  principle 
wave-function  during  a  voiced  sound  is:  that  wave-function  having  the  maximum 
amplitude  of  all  the  wave-functions  within  one  pitch  period.  Therefore,  there 
is  one  principle  wave- function  associated  with  each  pitch  period.  In  most 
cases  the  voiced  phonemes  in  the  100  -  400  Hz.  sub- string  are  made  up  of  one 
wave-function  per  pitch  period.  Therefore  every  one  is  a  principle  wave- 
function.  This  can  be  seen  by  looking  at  Table  C-l.  The  principle  wave- 
functions  are  marked  by  a  star  next  to  the  amplitude  parameter.  DELTAC  in 
this  case  is  a  valid  pitch  period  measurement  and  shows  the  change  in  pitch 
as  a  function  of  time,  thus  providing  information  on  voice  inflection  and 
accentuation. 

It  is  also  possible  to  have  up  to  seven  or  more  wave- functions  per  pitch 
period.  When  this  occurs,  the  one  with  the  largest  amplitude  is  classified 
as  a  principle,  while  the  remaining  wave- functions  are  classified  as  followers. 
Referring  back  to  the  sub-string  shown  in  Figure  C-28,  the  principle  wave- 
functions  occur  at  times  A  and  B.  The  parameter  listings  of  Table  C-4  for 
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6.79 
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9.41 
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9.65 
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9.G6 

0.7120* 

9.97 
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10. OC 
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0.6821* 

10.70 

0.6^77* 

11.11 

0.6052* 

11.17 

0.5465* 

12.09 

0. 4901* 

12.G1 

0.4100* 

1G.G5 

0.26S7* 

15.12 

0.7850* 
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0.1046* 

12.95 

0.0423* 

15. 1G 
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133.03 

DFL7AC 

144.21 

11.17 

150.07 

G.G6 

157.93 
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1G4.27 

6.44 

170.71 

6.44 

177.09 

6.5S 

103.54 

6.44 

10  9.98 

6.44 

196.42 

6.44 

202.97 

6.56 

209.  «=.? 

6.61 

216.26 

G.6G 

225.09 

6.0  4 

229.99 

6.09 

256.94 

6.95 

244.13 

7.10 

251.54 

7.41 

259.17 

7.65 

257.27 

8.09 

27G.5G 

9.29 

285.17 

8.60 

294.17 

9.00 

3G4.97 

90.80 
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»! 

360.0 

2.5  6 
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1.75 

130.G 

2.46 

61.4 

2.72 

49.1 

2.46 

44.3 

2.61 

45.7 

2.6! 

42.1 

2.61 

41.1 
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2.72 

35.0 

2.72 

33.4 

2.70 

30.9 

2.78 

2  0.1 

2.91 

22.1 

2.91 

25.5 

2.97 

24.1 

2.91 
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3.0  5 
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5.12 

3  57.2 

4.15 

256.3 

3.29 

200. S 

2.56 

322.6 

2.7E 

360.0 

3.45 

FO'Z.) 

171.4 

DFLTAF 

23  5.3 

65.9 

269.1 

33.7 

282.0 

12.9 

28  0.0 

-2.0 

292.5 

2.5 

204.2 

1.7 

283.7 

-0.5 

2C3.4 

-0.3 

28  2.9 

-0.5 

2.82.0 

-0.0 

278.9 

-3.9 

276.2 

-2.7 

274.5 

-1.7 

272.0 

-2.5 

268.0 

-3.9 

260.5 

-7.4 

2  52.4 

*  *  •  a 

247.6 

-4.7 

247.8 

0.1 

244.2 

-7.5 

236.5 

-7.G 

214.9 

-21.5 

228.2 

13.2 

Table  C-l  Wave- function  parameters  for  the  100  -  400  Hz.  sub-string  of  rtsteek". 
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5.41 
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146.60 
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8  .24 
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P  .66 

164.33 
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244.75 
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7.41 

0.1702* 
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0.0710 

7.61 
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5.87 

267.55 

7.75 

0.0492 

7.45 
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0.0624* 

9.70 

277  .30 

9.74 

0.0443 

5 .  /  6 

282.26 

0.0347* 

5.31 

286.93 

9.65 

0.0537* 

5.04 

291.66 

4.73 

3.0274* 
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373.56 
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0.0460* 

3.69 

380.19 

6.8  4 

0.0332 

3.27 

382.75 

0.0677* 
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387.31 

7.12 
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C  .7  4 

391.97 

0.036G* 

4.04 
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12.99 

0.0343* 
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3.87 
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706. 0 
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49.9 

2  24.9 

4.74 
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1.0 
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0.9 
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-10.7 
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4.13 
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-9.2 
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4.92 

409.3 
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3.7  6 
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-9  p 
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6  0.4 

4.26 
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5  •  o  o 
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30.0 

3.46 

4  90.6 

G.8 

2  58.2 

3.12 
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35.0 

2.90 
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CM 

• 

3  57.2 

2.90 

399.2 

175.4 

3.83 

396. 4 

-98 , 5 

189.9 

2.78 

482.3 

77.2 

2.7  n 

523.5 

127.1 

8*  .  4 

3.12 

525.* 

2.9 

360.0 

4.15 

684.6 

159.9 

173. S 

2.56 

692.9 

8  .5 

1E4.5 

2.24 

686.5 

179.9 

4.57 

547.0 

-145.8 

268.6 

4.26 

488.1 

23  4.9 

2.41 

597.1 

50.3 

2  59.1 

3.12 

619.5 

22.3 

Table  C-2  Wave- function  parameters  for  the  400  -  900  Hz.  sub-string  of  "steek1 
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Table  C-3  Wave-function  parameters  for  the  900  -  l800  Hz.  sub- string  of  "steek". 
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269.9 

C.99 

2339.1 

122.1 

3.5* 

2506.1 
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2.5  6 

2358.7 

32.4 

2.2S 

2*51.8 

3  21.0 

4.41 

2673.3 

c.o 

41.1 

2.72 

2399.1 

89. 9 

5.82 

2142.7 

195.0 

2.72 

2506.1 

288  .7 

8.00 

2227.8 

23.1 

2.97 

2519.8 

350.3 

5.12 

2*99.0 

-74.5 

2  5.* 

2.56 

2339.1 

1  f  4 . 5 

2.46 

2358.7 

16.? 

2.46 

237?. G 

134.9 

3.76 

2419.7 

88  .1 

5.12 

23 5?. 7 

1.8 

3.36 

2462.1 

3  OS .  9 

4.7  4 

2462.1 

-136.8 

369.9 

2.97 

2263.6 

26G.5 

7.11 

2263  ,7 

330.1 

3.4* 

2263.6 

Table  C-4  Wavefunction  parameter  for  the  1800  -  36OO  Hz.  sub-string  of  the 
word  "steek". 

(a)  a  portion  of  the  /s/ 

(b)  a  portion  of  the  /ee/ 
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the  vowel  portion  of  the  word  "steek"  likewise  show  the  relationship  between 
the  principle  wave- functions  and  the  followers. 

During  an  unvoiced  phoneme  and  some  phoneme  transitions  the  wave- 
functions  are  somewhat  random  or  burst-like.  For  these  cases  there  is  no 
strict  principle- follower  definition.  Therefore,  a  wave- function  is  selected 
as  a  principle  if  its  amplitude  is  within  20  %  of  the  amplitudes  of  the  wave- 
functions  both  before  and  after  it  or  if  its  amplitude  is  greater  than  80$  of 
the  amplitude  of  the  previous  principle  wave- function. 

For  the  word  "steek",  Band  1  is  described  by  24  parameter  sets,  Band  2  - 
34  parameter  sets,  3and  3-20  parameter  sets,  and  Band  4  -  201  parameter  sets. 

The  total  for  the  entire  word  is  279  parameter  3ets.  If  the  principle  wave- 
functions  are  sorted  from  each  band  then  Band  2  is  reduced  to  29  parameter 
se  .s.  Band  3-19  parameter  sets,  and  Band  4-75  parameter  t»-ts.  This  results 
in  a  total  of  147  parameter  sets  for  the  whole  word.  The  usefulness  of  these 
sorted  parameters  will  be  shown  later. 

Recognition  of  Steady- State  Vowels  of  a  Single  Speaker  Using  Three  Fixed- Filter  Bands 

The  algorithm  for  the  recognition  of  steady- state  vowels  was  implemented 
in  FORTRAN  on  the  IBM  1800  computer.  Due  to  limitations  in  memory,  however, 
only  three  bands  could  be  operated  upon,  but  the  results  were  still  quite 
successful.  A  frequency  parameter  was  calculated  for  each  sub-string  using 
Equation  (C-10)  and  wave-function  parameters  for  an  18  msec,  portion  of  the 
vowel.  This  was  not  a  true  formant  frequency  but  it  was  used  in  the  same  manner. 

An  amplitude  parameter  was  assigned  to  each  substring  by  extracting  the  amplitude 
parameter  for  the  first  principle  wave-function  occuring  within  the  same  18  msec, 
interval.  Examination  of  the  frequency  and  amplitude  data  for  12  vowels  revealed 


oiO 


340 

soo  too  loo  3<PJ) 

Ft  tWh) 

Figure  C-3^  Steady- state  vowel  frequency  plot. 

Fg  =  Weighted  average  frequency  in  1+00  -  900  Hz.  Band 
F^  -  Weighted  average  frequency  in  900  -  1800  Hz.  Band 
Single  Speaker 
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that  the  frequency  in  rand  1  did  not  vary  significantly  to  warrant  its  use 
as  a  recognition  parameter.  Because  of  the  higher  formant  content  of  Bands 
2  and  3  the  frequency  parameters  for  these  bands  were  plotted  against  each 
other  and  yielded  a  useful  map  with  reasonable  separation  in  most  cases. 

(See  Figure  C-31*) 

A  second  vowel  map  was  made  utilizing  the  amplitude  parameters.  In  most 
cases  the  amplitdue  relationships  between  sub-strings  seemed  to  vary  from 
vowel  to  vowel.  When  the  amplitudes  in  Bands  2  and  3  were  first  normalized 
by  the  amplitude  of  Band  1  and  then  plotted  against  each  other,  the  result 
was  the  second  map,  with  reasonable  separation  between  vowels.  (See  Figure  C-35)- 

The  recognition  algorithm  itself  was  based  on  a  simple  binary  decision 
tree.  Decision  lines  were  drawn  through  the  frequency  map,  isolating  each 
of  the  vowels  or  groups  of  vowels  as  much  as  possible.  The  final  recognition 
decision  was  then  made  after  an  examination  of  the  amplitude  map,  on  which 
decision  lines  were  also  made.  The  decision  tree  u;:ed  is  shown  in  Figure  036. 

For  a  single  speaker  the  results  were  more  than  95$  correct  on  the  12 
vowels  making  use  of  only  the  first  three  fixed  filter  bands.  An  exhaustive 
study  and  test  was  not  conducted  because  of  the  limited  usefulness  of  a  steady- 
state  vowel  recognizer.  However,  each  vowel  was  spoken  from  2  to  3  different 
ways,  and  the  vowel  /i/,  as  in  "beet”,  was  spoken  12  times  at  various  pitches. 
Enough  information  was  gathered  and  the  results  conclusive  enough  to  demonstrate 
the  feasibility  of  using  wave-function  parameters  for  phoneme  recognition.  The 
next  step  seemed  to  be  a  more  realistic  one,  that  of  vowel  phonemes  connected 
to  other  phonemes. 

Recognition  of  Vowel  Phonemes  Embedded  between  Two  Unvoiced  Phonemes  for  a 


Single  Speaker  Using  3  Fixed-Filter  3ands 


Figure  C-35  Steady-state  vowel  map  as  a  function  of  Band  2  and  Band  3  amplitudes, 
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The  idea  of  embedded  vowel  reocgnition  suddenly  becomes  complicated  by 
the  following  facts : 

1)  With  steady- state  vowels  the  formant  frequencies  remained  fairly 
constant  throughout  the  duration  of  the  vowel.  Therefore,  the  weighted 
average  frequency  associated  with  each  filter  band  remained  fairly  constant. 

Now  the  formants  (and  therefore  the  average  frequency  within  each  band)  vary 
as  a  function  of  time,  depending  upon  the  phonemes  before  and  after  the  vowel 
and  the  amount  of  coupling  between  them. 

2)  The  amplitude  of  steady- state  vowels  remained  fairly  constant  throughout 
the  vowel.  Now  the  amplitude  is  generally  some  form  cf  increasing-decreasing 
function. 

3)  Because  of  the  continuous  nature  of  the  steady- state  vowel,  almost 
any  time- interval  could  be  used  as  a  representative  sample  of  that  vowel.  Now 
4'he  time  during  which  the  vowel  occurs  must  first  be  determined  before 
recognition  information  can  be  derived  from  the  wave- function  parameters. 

Due  to  the  use  of  FORTRAN,  this  recognition  scheme  was  also  restricted 
to  the  first  three  fixed- filter  bands.  Because  of  the  good  results  of  the 
steady-state  vowel  recognizer,  it  was  decided  to  use  amplitude  and  frequency 
as  the  recognizer  inputs.  The  only  restriction  on  the  spoken  word  was  that 
the  phoneme  content  be  unvoiced-voiced  vowel-unvoiced. 

The  first  step  in  the  process  was  to  do  a  sort  on  the  first  three  bands. 

The  sort  program,  as  described  previously,  finds  the  principle  wave- function 
associated  with  each  pitch  period  and  discards  the  rest.  This  indicates  a 
pitch  synchrony  .s  type  of  recognition.  A  scan  was  then  made  of  Band  1  to 
determine  the  time  when  the  pitch  phenomenum  began  and  ended,  the  result 


being  the  time  during  which  the  vowel  occurred. 

This  time  window  was  then  projected  to  Bands  2  and  3«  If  there  were 
more  than  5  pitch  periods  during  the  vowel,  the  first  two  and  last  two  wave- 
functions  were  discrrded  in  order  to  help  eliminate  phoneme  transition  effects. 
A  simple  average  was  then  calculated  for  the  remaining  principle  frequencies 
in  tends  2  and  3« 

It  has  been  noted  that  the  principle  wave- functions  in  each  band  are 
within  allignment  of  each  other  by  -3  msec.  To  perform  valid  amplitude 
normalization,  the  amplitude  of  each  principle  wave- function  occuring  during 
the  vowel  in  tend  2  was  divided  by  the  amplitude  of  the  corresponding 
principle  wave- function  in  Band  1.  A  simple  average  was  then  taken  of  the 
resulting  amplitude  ratios,  yielding  an  overall  Ag/A^.  The  same  procedure 
was  followed  to  determine  A^/A^. 

An  example  of  the  procedure  is  as  follows.  Figure  C-37  is  a  picture 
of  the  vowel  portion  of  the  word  "sock"  and  its  first  three  corresponding 
sub-strings.  After  an  analysis  and  sort  on  each  band,  a  scan  was  made  of 
tend  1  in  order  to  determine  the  time  occurance  of  the  vowel  segment.  The 
vowel  time-window  is  shown  at  the  top  of  Figure  C-38.  Also  in  the  figure  are 
the  principle  A,  C,  and  F  parameters  for  the  first  three  bands,  which  occurred 
during  the  defined  time  window.  An  allignment  procedure  was  then  performed. 
Each  wave- function  in  Band  1  was  compared,  in  time,  with  those  of  the  other 
two  bands.  Those  that  occurred  within  3  msec,  of  each  other  in  all  three 
bands  were  saved,  the  others  discarded.  The  A,  C,  and  F  parameters  remaining 
after  such  an  allignment  are  shown  in  Figure  C-39*  From  these  data,  each 
amplitude  parameter  in  tend  2  was  divided  by  the  corresponding  amplitude 
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parameter  in  Band  1,  yielding  a  set  of  ratios.  The  same  was  done  to 

generate  a  set  of  Ay^  ratios,  and  these  two  ratio  sets  are  shown  in 
Figure  C-40.  Also  given  in  that  figure  are  the  simple  averages  of  these  two 
sets,  thus  giving  two  recognition  parameters.  Figure  C-4  also  shows  the 
results  of  taking  the  simple  average  of  the  frequency  terms  of  each  band 
contained  in  Figure  C-39*  The  average  frequencies  calculated  for  Bands  2 
and  3  were  also  used  as  recognition  parameters. 

As  with  the  steady- state  vowel  recognition  the  two  amplitude  terms  were 
■clotted  against  one  another  and  the  two  frequency  terms  resulting  in  two 
embedded  vowel  maps.  Decision  lines  were  drawn  first  on  the  frequency  map 
to  isolate  the  vowels  as  much  as  possible  and  the  final  recognition  decisions 
were  based  on  the  amplitude  map.  A  binary  decision  tree  was  then  used  as 
the  basic  recognition  algorithm,  based  on  the  decision  information  extracted 
from  the  two  vowel  maps. 

The  results  of  this  recognition  technique  were  good  though  not  as 
Impressive  as  steady-state  vowels.  Out.  of  32  words,  each  containing  one 
vowel,  only  4  wrong  decisions  were  made.  Words  were  chosen  such  that  each 
of  12  vowels  was  done  at  least  twice.  Some  of  the  words  spoken  were: 
coast,  sit,  hat,  sock,  foot,  etc. 

Misrecognition  occurred  on  the  vowels  embedded  in  the  following  words: 
"gus",  "get",  "tug",  and  "shook".  The  reason  for  errors  on  the  first  three 
can  be  attributed  to  the  fact  that  the  vowel  in  each  was  coupled  with  a  voiced 
/ g/  phoneme.  Therefore,  transition  effects  were  included  as  part  of  the  vowel 
and  resulted  in  erroneous  recognition  parameters. 

3ecause  of  errors  like  this  it  became  clear  that  the  first  step  in 
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NOT  REPRODUCIBLE 


Figure  C-37  Vowel  portion  of  the  word  "sock"  and  its  first  three 
sub-strings . 
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Figure  C-3-3  Principle  A,  C,  and  F  parameters  occuring  duririi.:  the  vowel  portio 
of  the  word  "sock". 
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this  procedure  should  be  expanded  before  any  more  recognition  was  attempted. 

This  step  was  the  one  which  determines  the  time  during  which  the  vowel  occurs. 

It  was  decided  to  make  this  a  general  purpose  segmentation,  where  the  time 
occurence  of  each  phoneme  would  be  defined. 

The  development  of  a  general  purpose  segmentation  algorithm  is  currently 
underway.  It  will  be  implemented  on  the  IBM  1800  speech  system  using  Assembly- 
language  thus  eliminating  some  of  the  memory  restrictions  incurred  with 
FORTRAN  on  this  computer ■  as  a  result  the  data  from  all  four  fixed-filter  bands 
will  be  utilized  for  the  segmentation  and  later  for  recognition. 

Because  of  the  high  information  content  of  these  four  frequency  bands,  as 
described  earlier,  a  highly  accurate  phoneme  segmentation  will  be  possible. 

This  means  that  phoneme  transition  effects  can  be  reduced  or  eliminated. 
Recognition  can  then  be  started  with  the  vowel  set  and  easily  be  expanded  to 
the  entire  phoneme  set. 

(e)  Data  Compression  Studies 

The  study  of  data  compression  is  closely  related  to  the  problem  of  speech 
recognition,  with  the  possibility  that  some  techniques  (e.g.  segmentation)  may 
be  shared.  Therefore,  for  both  compatibility  and  convenience,  the  frequency 
bands  selected  for  the  ASC0N  streams  were  chosen  to  be  those  used  for  the 
recognition  studies.  The  basic  speech  waveform  is  thus  assumed  to  have  been 
bandlimited  to  .1-  3*6  kHz.,  and  the  four  sub-bands  employed  are  .1-  .4, 

.4  -  .9,  .9  -  1.7,  '"  d  1,7  -  3.6  kHz. 

Two  general  areas  of  study  have  been  emphasized  ir.  the  preliminary 
compression  studies  completed  to  date.  The  first  is  the  tabulation  of  the  data 
rate  of  the  "highest  fidelity"  ASC’dN  representation.  Ibis  involves  making 
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the  best  possible  fit  to  the  speech  waveform,  in  the  time  domain,  with  Gaussian 
wave- functions  and  determining  the  required  bit  rate  without  further  compression. 
This  is  estimated  by  computing  the  number  of  wave- functions  required  and  assuming 
eight  bits  for  each  of  the  five  parameters.  Studies  of  vowels  have  shown  bit 
rates  on  the  order  of  50  -  60,000  bits 'second  for  the  raw  ASC$N  stream.  This 
improves  somewhat  for  full  words  (approx.  30  -  50,000),  and  even  more  for  phrases, 
since  the  ASC0N  representation  automatically  ceases  for  any  quiet  periods,  no 
matter  how  short. 

The  second  general  area  of  study  has  been  an  attempt  to  determine  how  many 
of  these  wave- functions  are  superfluous  from  a  perceptual  standpoint.  Since 
the  elimination  of  redundant  parameter  sets  is  extremely  simple  to  implement 
(given  a  criterion  for  redundancy),  this  seemed  a  natural  place  to  begin  rather 
than  to  initially  attempt  those  a* proaches  involving  encodement  of  the  ASC0N 
sets.  A  sort  program  has  seen  implemented  to  select  the  "principal"* wave- 
functions  in  the  four  frequency  bands,  and  to  reconstruct  a  replica  of  the 
sound  based  only  on  the  principal  wave- functions.  In  the  lower  two  bands,  most 
of  the  wave-functions  qualify  as  principals;  ncrwever,  there  is  considerable 
redundancy  in  the  two  higher  bands.  Table  C-5  illustrates  results  for  a 
typical  vowel  sound  /i /  and  for  the  word  "dirt".  The  sounds  of  the  reconstructed 
waveforms  for  the  two  cases  are  perceptually  the  same  whether  all  the  ASC/6N 
sets  are  employed  or  only  the  principals.  It  should  be  noted  that  no  reduction 
at  all  was  made  in  the  first  two  bands.  This  result  can  be  improved  on  by  the 

* 

The  appropriate  definition  of  a  principal  wave-function  is  still  a  subject  of 
study.  Currently  it  appears  that  if  the  amplitude  of  a  wave- function  is  either 
a  local  maximum  or  within  80$  of  one  of  its  neighbors,  it  should  be  so  classed. 


H/ 

Number  of  ASTON  Sets 

After  Sort 


Frequency  Bands 

Original 

Frog ram 

100  -  400  Hz. 

47 

47 

400  -  900  Hz . 

65 

65 

900  -  1700  Hz. 

137 

42 

1700  -  3300  Hz. 

320 

100 

Total 

569 

254 

Fata  Rate  bits/sec. 

54,500 

24,200 

Frequency  Band 

"Dirt" 

Original 

Number  of  ASCON  Sets 

After  Sort 
Program 

100  -  400  Hz. 

46 

46 

“  JO  -  900  Hz . 

65 

85 

900  -  1700  Hz. 

116 

35 

1700  -  3300  Hz. 

176 

54 

Total 

423 

220 

Tata  Rate  bits/sec 

40,400 

21,000 

Table  C-5 
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proper  definition  of  a  principal  wave-function.  However,  the  improvement  to 
the  total  bit  rate  is  small,  and  was  ignored  for  this  illustration. 

In  addition  to  the  sort  on  principal  wave- functions,  experiments  are  being 
conducted  to  evaluate  the  discarding  of  ASC6N  sets  with  an  amplitude  less  than 
a  (normalized)  threshold.  The  threshold  has  been  normalized  in  one  of  several 
ways.  Since  the  largest  amplitude  is  generally  normalized,  the  simplest 
procedure  is  to  set  the  threshold  to  a  fixed  value.  This  implies  that  a  wave- 
function  will  be  discarded  if  the  amplitude  is  less  than  some  specified  fraction 
of  the  largest  wave-function  occuring  in  any  of  the  four  bands.  A  somewhat 
more  effective  procedure  also  normalizes  with  respect  to  the  largest  amplitude 
in  the  band  and  sets  a  threshold  test  (at  perhaps  a  different  level).  This 
allows  a  consideration  of  the  possible  differences  in  amplitudes  from  band  to 
band.  These  procedures  appear  to  be  capable  of  reducing  the  data  rate  by  an 
additional  factor  of  two  without  changing  the  perceptual  content  of  the  sound. 

A  final  procedure,  somewhat  more  ember  some  to  implement,  .s  to  normalize 
with  respect  to  the  largest  amplitude  in  a  moving  time  window  and  employ  a 
threshold  tect.  This  has  so  far  resulted  in  a  disappointingly  small  improvement 
over  the  previous  cases. 

As  soon  as  a  reliable  segmentation  program  is  developed,  work  will  commence 
on  encoding  ASCj6n  sets  during  vowel-like  sounds,  perhaps  by  n- level  delta 
modulation  techniques,  using  the  segmention  as  the  guide  as  to  when  to  start 
and  stop  the  encoding  procedure. 

(f)  Interrelation., hi  ns  between  a  Wave-Function  Representation  and  a 

* 

Formant  Model  of  Speech 

Theoretical  and  empirical  investigations  have  been  conducted  which  interrelate 

*  Markel,  John,  "On  the  Interrelationships  between  a  Wave-Function  Representation  and 
a  Formant  Model  of  Speech",  PhD  Dissertation,  Univ.  of  Calif.,  Santa  Barbara, July,1970. 
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the  parameters  of  the  wave- function  representation  to  those  of  a  classical 
formant  model.  Two  points  should  be  made  regarding  these  relationships. 

First,  the  transformations  are  one  way;  that  is,  parameters  of  the  wave- 
function  model  are  transformed  to  the  formant  model  only.  The  reason  is  that 
the  formant  model  defines  what  can  be  considered  as  a  fundamental  set  of 
parameters  for  voiced  speech.  In  terms  of  information  theory,  there  is  no 
other  known  set  of  acoustic  parameters  which  is  capable  of  describing  the 
essential  character  of  the  vowel  sounds  with  lower  information  capacity. 

It  has  been  shown  that,  in  general,  description  of  a  vowel  sound  in  terms 
of  the  wave-function  parameters  requires  many  more  parameters.  What  has 
been  developed,  then,  is  several  many- to- one  transformations  which  map  the 
wave- function  parameter  set  into  estimates  of  the  formant  parameter  set. 

The  second  point  to  be  made  is  that  the  interrelationships  to  be  pre¬ 
sented  are  empirical  ones  based  upon  reasonable  engineering  assumptions  along 
with  some  theoretical  justification.  As  will  be  shown  later,  these  relation¬ 
ships  have  given  very  good  results  in  predicting  parameters  of  this  formant 
model. 

In  order  that  the  true  parameters  might  be  known,  the  study  was  conducted 
on  a  set  of  synthetic  vowels.  (The  procedures  have  also  been  applied  to  real 
speech,  and  give  reasonable  results,  but  since  all  procedures  for  estimating 
the  formant  parameters  are  subject  to  error,  analyzing  real  speech  cannot 
give  a  base  of  reference  for  error  analysis.)  The  synthetic  speech  which 
was  analyzed  had  a  realistic  glottal  driving  function,  periodicity,  a  rad:  ation 
and  a  correction  term. 

A  final  comment  relates  to  preprocessing  of  the  vowels  before  analys  s. 
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It  has  been  observed  that  a  sufficient  condition  for  wave-function  isolation 

is  that  major  energy  regions  be  isolated  during  the  analysis.  Although  the 

regions  were  separated  manually  for  this  study,  it  appears  reasonable  to 

assume  that  this  separation  could  be  accomplished  automatically  except  for 

* 

the  back  vowels  such  as  (3/  and  /o'.  Suzuki  uas  considered  one  approach 
to  automatic  separation  of  formant  regions  by  moment  methods. 

For  the  cases  where  two  closely  spaced  formants  cannot  be  resolved 
automatically,  a  separate  algorithm  was  developed  for  estimating  both  formants 
and  bandwidths  from  the  wave- function  parameters  that  define  the  region 
containing  two  formants.  As  for  the  filters,  three  contiguous  sin  x/x  type 
of  filters  are  used  to  define  the  range  (0,3000)  Hz.  for  each  vowel  (except 
for  / i/  which  has  a  range  of  (0,3500)  Hz.). 

The  proposed  method  for  estimating  formant  parameters  from  wave-function 
parameters  depends  upon  each  formant  region  being  isolated  (except  where  two 
closely  spaced  formants  aie  known  to  reside  within  a  single  filtered  region). 

Also  the  success  of  the  method  depends  upon  being  able  to  isolate  single  pitch 
periods.  This  requirement  is  necessary  for  the  estimation  of  bandwidth.  If 
only  estimation  of  the  formant  frequencies  was  desired  this  requirement  could 
be  eliminated. 

The  first  step  in  estimating  formant  parameters  from  wave- function 
parameters  is  to  isolate  a  single  representative  pitch  period  of  the  synthetic 
vowel  generated  from  the  wave- function  parameters.  The  set  of  parameters 
representing  the  vowel  is  then  used  as  the  input  set  for  the  transformation 
equations.  The  following  algorithm  is  used  for  isolating  a  single  pitch 
period  from  the  wave-function  parameter  sets. 

*Suzuki,  T. ,  Y.  Kadokava,  and  K.  Nakata,  "Formant- Frequency  Extraction  by  the  Method 
of  Moment  calculations, "  JASA,  Vol.  35,  September  1963,  pp.  1345-1353- 
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Over  a  time  interval  containing  at  least  one  pitch  period,  the  A  parameter 
list  is  searched  for  a  maximum.  (The  number  of  parameter  sets  considered  is 
determined  by  noting  that  the  corresponding  C  parameters  must  be  within  the 
chosen  time  interval).  The  five  parameter  sets  corresponding  to  this 
maximum  A  is  defined  by 

r<(l,n)  =  [A(l, n) ,  3(1,  n),  C(l,n)  <3(1, n),  F(l,r.)j 

where  n  denotes  the  particular  filtered  region  n  =  1,2,3.  The  start  of  the 
next  period  is  determined  by  finding  the  maximum  A  parameter  whose  corresponding 
C  parameter  satisfies  TMIN  <  C  -  C(l,n)  <  T^  where  and  define 

minimum  and  maximum  expected  pitch  periods.  Consider  this  set  H(M+1,  n). 

For  region  n,  the  parameter  set  defining  the  isolated  pitch  period  is  then 
given  by 

*\  =  C°(i,n),...,w(Mn,n)] 

where  Mn  in  general  is  different  for  each  n.  Note  that  in  general,  the  pitch 
period  will  be  slightly  different  for  each  region. 

One  other  fact  that  needs  to  be  emphasized  is  that  this  algorithm  is 
not  being  proposed  as  a  useful  method  for  extraction  of  fundamental  frequencies 
from  arbitrary  speech.  It  is  simply  a  reasonable  method  for  extracting  the 
pitch  period  from  the  Digital  Vowel  Synthesizer  (DVS)  automatically,  where 
all  glottal  pulses  are  defined  as  identical.  This  is  certainly  not  the  case 
in  real  voiced  speech.  The  problem  of  estimating  formant  parameters  can  be 
formulated  in  the  following  way.  For  region  n,  n  =  1,2,3  the  wave-function 
representation  of  the  isolated  vowel  pitch  period  will  be 

'U>(i,K)t]  (C-20) 


where 


Si. 


r[w(i,n),tj  *  A i  i , r. )  exp  L-~2(t-C(i,»)  )2/S2(i,n)J  * 
cos!  2r'P'(i,n)t  -  (5(i,n)] 

Hie  fourier  transform  of  these  wave-function  components  is 

■  \  A(i,n)S(i,n)  -  2  _ 

vU  (i,n),jfJ  =  ^  exp{[f  -  F(i,n)J  S2(i,n)  - 

j[0(i,n)  +  2nfC(i,n)]J 

7i-om  this  last  expression,  note  that  near  f  =  F,  the  maximum  value  of  the 
magnitude  spectrum  i:':  obtained  as  approximately 

'*(a,jF)l=  (C-2l) 

The  equation  says  that  the  peak  value  of  the  spectrum  of  any  wave  function 
is  proportional  to  its  AS  product.  This  leads  to  the  conjecture  that  over 
any  time  interval  P,  the  importance  of  any  wave- function  in  relation  tothe 
overall  spectral  result  can  be  ranked  in  terms  of  the  individual  AS  products 
of  each  wave- function.  With  this  information,  the  conjecture  can  be  made 
that  each  of  the  F  terms  contributes  to  the  format  frequency  roughly  in 
proportion  to  its  corresponding  AS  product  giving  the  formant  frequency 


estimator 


A 

F  = 


Mn 

£  A(i.n)S(i.n)F(i,n) 

i=l _ 

M 

A(i,n)  S(i,n) 

i-1 


n  =  1,2,3 


(C-22) 


The  estimation  of  formant  bandwidth  has  traditionally  been  a  frequency  domain 
operation  with  few  attempts  in  the  time  domain.  A  problem  which  exists  in 
frequency  domain  methods  is  that  of  constantly  overestimating  the  bandwidth 
value.  Even  if  the  discrete  Fourier  transform  is  obtained  with  high  resolution 
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(in  the  order  of  3  -  10  Hz.),  direct  estimation  of  bandwidth  from  the  spectrum 

will  usually  fail  due  to  the  effect  of  the  periodicity  which  causes  zeros 

(or  oscillations)  in  the  spectrum  and  the  effect  of  the  glottal  wave  and 

radiation  term  (which  combine  to  act  as  a  low  pass  or  smoothing  filter). 

* 

The  work  of  Dunn  which  is  widely  quoted,  depended  upon  fitting  to  the  spectrum 
templates  that  had  impulse  responses  identical  to  the  exponentially  damped 
sinusoids  used  as  sections  of  the  formant  model.  Application  of  the  wave- 
function  analysis  approach  to  filtered  vowel  sounds  suggests  a  different 
type  of  mathematical  "template  fitting."  The  parameters  which  describe  the 
envelope  fit  are  the  A, S,  and  C  parameters .  V  describes  the  envelope  peak 
amplitude;  S,  the  envelope  spread  (approximate  time  interval  containing  the 
energy  of  the  wave- function;  and  C,  the  location  in  time  of  the  envelope 
peak.  If  a  filtered  region  is  obtained  which  has  one  and  only  one  formant 
present,  the  effective  damping  of  the  time  domain  segment  is  strongly  correlated 
to  the  actual  bandwidth  of  the  formant  within  that  region. 

A  simple  estimate  of  Bn  can  be  obtained  from  the  wave-function  parameters 
by  consideration  of  the  following  mathematics. 

A  single  resonator  can  be  described  in  the  frequency  domain  as 

Y(s)  =  gTFn  _  (C-23) 

(S  +  TT3n)2  +  (2"Fn)2 

With  the  time  domain  equivalent  as 

y(t)  =  exp  (-ttb  t)  sin  (2"F  t) 
n  n 

At  the  positive  peaks  corresponding  to  times  t  and  t  , 

P  p+q 

y(t  )  =  exp  (-TT3  t  ) 

J v  p'  n  p' 

* 

Dunn,  H.K. ,  "Methods  of  Measuring  Vowel  Formant  Bandwidths,"  JASA,  Vol.  33>  No.  12, 
December  1961,  pp.  1737  -  17^6. 
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At  tp+qj  y(tp+q)  =  exp  ) .  Therefore,  can  be  obtained  by 
dividing  the  above  equations  and  taking  the  logarithm  of  each  side.  This 
results  in 


D  - 

n 


TT(t  -  t  ) 

p+q  py 


In 


y(t) 


(C-24) 


This  suggests  that  the  amplitude  parameters  A(l,n)  and  A(i,n)  of  two 
wave- functions,  and  their  corresponding  time  separation  C(i,n)  -  C(i,n) 
might  be  substituted  into  this  last  equation  to  produce  bandwidth  estimates 
of  the  form 


A 


In  [A(l,n)/A(i,n)J 
n[c(i,n)  -  C(l,n) J 


(C-25) 


Several  filtered  segments  (where  formants  are  isolated)  are  showr  in  the 

figures  with  the  corresponding  envelope  estimations  generated  by  using  the 

AS,  and  C  parameters  of  the  wave- function  representation. 

These  waveforms  suggest  a  bandwidth  estimator  composed  of  fitting 

exponential  curves  through  the  envelope  peaks  of  the  various  wave- functions 

using  a  starting  point  of  A(l,n)  located  at  C(l,n)  and  then  averaging  in  some 

form  the  estimated  bandwidths.  The  simplest  estimator  in  terms  of  the  wave- 

A 

function  parameters  in  the  sense  that  each  estimate  Bn,i  is  given  equal  weight 


is  thus 


A 

B  = 
n 


n  (M  -  1) 
n 


3* 

>  B  , 
L,  n,j 

i=2 


(C-26) 


r'igures  C-44  through  C-46  show  the  impulse  response  of  the  corresponding 
vocal  tract  model  section  with  the  estimated  Fn  and  Bn  parameters  vs.  the 
overall  wave- function  representation. 
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I 


Formant  amplitudes  are  tno  least  important  of  the  formant  parameters. 

* 

Fant  has  shown  that  by  knowing  the  formant  frequencies  and  bandwidths,  the 
spectral  amplitudes  can  be  related  along  with  the  amplitude  of  the  vocal 
tract  impulse  response.  A  physical  verification  that  this  is  possible  can 
be  obtained  by  designing  a  cascaded  vocal  tract  synthesizer  such  as  the  DVS. 
With  this  type  of  model,  formant  amplitude  is  not  even  specified.  However, 
for  the  sake  of  completeness,  an  estimate  of  spectral  amplitude  in  terms  of 

A 

wave- function  parameters  has  been  derived.  If  Fn  is  the  formant  frequency 
estimate  from  the  wave- function  parameters,  the  formant  amplitude  estimate 
can  be  made  from  M 

n 

An  =  |  ^  ’|[a(i,n),  JFn3  |  (C-27) 

where  >1/  is  defined  as  earlier. 


Figure  C-4l  Isolated  pitch  segment  for  /A/  ^(0,  .9)  KHz.  Dotted  lines 

indicated  effective  damping. 


-ft. 

Fant,  C.  G. ,  "0u  the  Predictability  of  Formant  Levels  and  Spectrum  Envelopes 
from  Formant  Frequencies,"  1956,  pp.  109»  120. 
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To  check  the  validity  of  the  proposed  interrelationships,  it  was  decided 

to  generate  ten  different  synthetic  vowels  and  perform  a  complete  analysis 

upon  each  to  determine  the  estimate  of  the  vocal  tract  model  parameters. 

* 

The  vowels  chosen  were  those  of  the  Peterson- Darney  study  with  the  average 
formant  frequency  values  they  suggested.  The  bandwidths  were  determined  by 
the  equation 

3 

E  =  (45  +  5F  )  10  J(KHz.)  0  <  f  <3  KHz. 

n  n  n 

which  is  a  good  fit  to  the  bandwidth  data  presented  by  Dunn  over  the  region 
(0,3)KHz. 

As  previously  discussed,  the  formant  frequency  is  of  most  importance 
in  the  specification  of  vowel  sounds.  Over  the  ten  vowels,  the  maximum 
error  in  the  estimation  of  was  105  Hz.  for  the  vowel  /ae/.  In  the  F 2 
region  the  maximum  error  was  80  Hz.  again  for  1 ae/ .  For  the  F^  region,  / i/ 
had  the  maximum  error  of  43.0  Hz.  These  results  are  displayed  for  the  ten 
vowels  used  in  Figure  C-47.  The  large  errors  in  the  F^  region  are  believed 
due  to  the  sampling  rate  of  the  system.  Certainly  as  the  effective  frequency 
of  the  signal  is  increased,  the  measurement  errors  will  increase.  These 
errors  can  be  reduced  by  either  increasing  the  system  sampling  rate  or  by 
interpolating  between  points  to  estimate  the  extrema  locations,  or  possibly 
by  both  methods.  With  these  modifications,  measurement  accuracy  of  the  F^ 
region  should  be  as  good  as  the  region.  The  use  of  an  interpolative 
method  to  determine  the  F  parameter  was  investigated,  and  the  results  show 
a  marked  reduction  in  error  for  region  three.  The  maximum  error  in  formant 

Peterson,  G.E.,  and  H.  L.  3arney,  'Control  Methods  Used  in  a  Study  of  the 
Vowels,"  JASA,  Vol.  24,  1952,  pp.  175-184. 
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frequency  for  the  three  bands  were  then  80  Hz.,  65  Hz.,  and  65  Hz.  respectively. 
This  technique  also  gave  a  marked  reduction  in  the  standard  deviation  of  the 
errors  for  the  third  region. 

The  maxi  :'um  errors  in  the  bandwidth  estimation  for  the  three  regions 
were  30  Hz.,  28  Hz.,  and  3?  Hz.  Although  the  percentage  errors  in  the 
bandwidth  estimations  are  quite  large  in  some  cases  (8k%  error  for  Region  1 
of  ,/u/),  the  important  factor  is  absolute  error.  Flanagan  has  discussed 
difference  limer.s  (just  perceiveable  differences)  for  formant  freyuencies, 
bandvidths,  and  amplitudes.  He  mentions  that  the  difference  limen  for  formant 
frequency  appears  to  be  3  -  5$  while  that  of  formant  bandwidta  appears  to  be 
30  -  k0%.  It  is  suggested  that  Flanagan's  bandwidth  limen  is  meaningful  only 
in  the  sense  that  amplitude  is  also  being  allowed  to  vary.  As  stated  by 
Flanagan,  a  30  -  ^0%  change  in  bandwidth  causes  roughly  a  1.5  db  amplitude 
change,  which  just  happens  to  also  be  the  difference  limen  for  formant 
amplitude.  It  is  postulated  that,  if  both  formant  amplitude  and  frequency 
are  held  constant  (as  could  be  done  with  a  parallel  analog  synthesizer), 

20  -  30  Hz.  deviations  on  B^(corresponding  to  percentage  errors  of  up  to 
67%)  would  be  totally  imperceptible. 

In  the  results  presented  so  far,  the  assumption  has  beer,  made  that  each 
region  would  consist  of  one  and  only  one  formant.  Although  this  appears  to 
be  a  realizable  assumption  for  many  vowels,  it  is  rather  doubtful  that  the 
assumption  is  valid  for  back  vowels  such  as  /a/  and  !q/ .  From  the  Peterson- 
Barney  data  Fg  -  equals  270  and  jbO  Hz.  for  !0/  and  /a'  respectively.  A 


Flanagan,  J.L.,  "Speech  Analysis  Synthesis  and  Perception",  Springer-Verlag, 
Berlin,  Germany,  1965  • 
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modification  of  the  algorithm  was  developed  which  estimates  both  formant 
frequencies  and  assigns  a  mean  bandwidth  estimate  to  each  of  the  formants. 


Results  for  foi  and 

'a  ’  are  shown  in 

Table  C-6. 

VOWEL 

F1 

£ 

‘1 

B1 

2 

/&'' 

730 

696 

47 

55 

!0; 

570 

527 

46 

45 

VOWEL 

F2 

B2 

/a/ 

1090 

1056 

51 

55 

;o  ■' 

84o 

817 

48 

45 

Table 

C-6  Estimates  of  formant  frequencies  and  bandwidths 
for  closely  spaced  formants  contained  within  a 

single  region. 

The  results  are  extremely  good  with 

a  maximum  formant  frequency  error  of 

1+3  Hz.  for  of  jO;  and  a  maximum  formant  bandwidth  error  of  8  Hz.  for 
of  /a/.  Certainly  these  results  should  not  be  generalized  to  imply  that 
accuracy  of  this  order  could  be  obtained  for  all  situations.  However,  it 
is  believed  that  the  technique  for  extracting  the  parameters  when  two  closely 
spaced  formants  are  contained  withing  a  single  region  is  generally  valid. 

b'peech  Project,  Software  and  Hardware 
The  software  and  hardware  development  since  the  last  technical  report 
has  been  devoted  to  continued  implementation  of  the  SEL  810B  Speech  Analysis 
Laboratory  ar.d  to  continued  work  on  the  Video- to- Digital  Converter  (VIDIG) 
software  for  the  biological  Sciences  Department.  There  were  three  main 
aspects  to  this  program  which  are  as  follows; 
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(1)  Completion  of  Video- to- Digital  Converter  hardware  system,  and 
definition  of  the  hardware/software  complex  to  provide  an  on-line  system 
for  biological  research. 

(2)  Continuation  of  S EL- 8103  software  development  including  completion 
of  all  of  Phase  II  except  the  mathematical  operations  for  levels  I  and  II 
which  are  currently  under  implementation. 

(3)  Completion  of  design  of  SEI--8l0b  interface  to  speech  station. 

The  above  items  are  discussed  in  more  detail  in  the  following  paragraphs. 

(a)  On-Line  System  for  Biological  Research 

The  next  phase  of  the  project  with  trie  Video-to- Digital  Converter  is  to 
generate  the  software  to  complete  the  combination  of  hardware  and  programming 
forming  a  unique  tool  for  biological  research. 

In  a  joint  meeting  between  the  engineering  and  biological  research  people 
involved  in  this  project,  a  set  of  goals  defining  the  useful  biological 
parameters  to  be  extracted  from  the  biological  On-Line  System  were  laid  out. 

An  outline  of  these  goals  is  as  follows: 

1.  Generalized  input  program  for  biological  data  from  the  video  source 
using  the  VIDIG. 

2.  Establish  scale  to  determine  the  ratio  of  internal  (computer)  units 
to  external  units. 

3.  Display  descriptor  words. 

4.  Sort  and  find  bugpaths 

5.  Humber  of  label  bugpaths  for  display 

6.  Display  video  or  path  data 

7.  Connect  bugpaths  extrapolating  across  local  voids  of  data 


93. 


8.  Eliminate  unwanted  bugpaths 

9.  Path  Dynamics  -  for  individual  bugpaths  and  ensemble  averages  of 

all  bugpaths 

a)  x  velocity  *  y  velocity  =  ^ 

rrrs  7772  ie  (t ) 

b)  linear  velocity  V(t)  ■  J  +  (^J  e 

9(t)  =  direction  of  travel 

c)  curvature,  radius  of  curvature 

d)  length  of  paths  =>  9 

P 

e)  turning  *  ^  ^  dt  =  9(p)  -  9q 

f)  path  haiti  =*  length  of  time  in  position 

The  physical  system  as  it  exists  on  the  IBM  1800  computer  is  shown  in 
Figure  C-48  in  the  form  of  a  block  diagram. 

Note  that  the  Video- Digital  Converter  is  capable  of  quantizing  and 
rendering  to  a  computer  any  televisible  event.  The  light  pen  input  is 
accomplished  by  focusing  a  television  camera  onto  a  clear  plastic  screen 
and  a  small  pen-light  flashlight  serves  as  the  televisible  event.  These 
coordinates  are  read  into  the  computer  then  averaged  and  can  be  displayed 
in  non-store  mode  on  the  display  scope  as  a  pointer  to  data  structures. 

The  on-line  system  for  biological,  research  is  being  developed  as  a 
stand-alone  single  station  on-line  system  for  the  1800  computer  similar  in 
structure  to  that  existing  on  the  360  computer.  The  on-line  system  operates 
with  the  disk,  occupies  about  k  of  the  16k  of  core  leaving  the  rest  for  data 
and  has  full  alpha-numeric  and  curvilinear  display  available  on  the  Tektronics 
6ll  display  scope. 


Sl' 

dt 


J 


dx 
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Figure  C-  48  VIDIG  On-Line  System  for  Biological  Research 
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The  Biological  System  will  define  two  levels  of  this  on-line  system: 

Level  VI  for  the  Yin  -,  input,  display  and  special  programs  necessary  for 
defining  and  operating  on  the  three  dimensional  data  structures  unique  to 
the  VIDI"’.  Level  V  will  be  for  the  display  and  further  analysis  of  the 
biological  data  after  processing  anf  refinement  of  Level  VI.  Data  is  mani¬ 
pulated  with  operator  control  via  the  keyboard  input.  Each  operation  calls 
the  macro  controller  program  which  allows  for  two  alphabetic  and  three  numeric 
trailing  predicates  and  is  terminated  by  pushing  the  "Return"  button.  For 
example,  the  "Display"  operator  on  Levej.  VI  for  viewing  the  quantized  video 
frames  stored  in  core  has  the  following  format: 

DISPLAY  N1S  Ng,  N3  RETURN 

where 

0r^  =  D  display  in  Dot  mode 

L  display  Line  connecting  the  dots 
=  N  print  the  page  number  at  the  starting  coordinate  of  each  page 
N.  =  starting  page  number  to  be  displayed 
Mg  =  number  of  pages  to  be  displayed 

=  delay  count  for  slowing  down  the  display  if  desired 
The  operation  is  defined  if  none  of  the  trailing  predicates  are  specified 
(i.e.  Display  Return).  This  will  display  all  of  the  video  pages  in  dot  mode 
with  no  delay. 

Currently  the  Level  VI  programs  most  of  which  are  disk  based  are  well 
underway  ar.d  may  be  broken  into  tr.ree  broad  categories: 


This  program  initialized  the  data  input 


a. 
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channel  to  read  in  the  VDP  at  maximum  rate  and  defines  the  data  structure 
to  be  used. 

b.  Load 'Store  is  for  transferring  data  within  core  or  to  and  from 
portions  of  the  disk  allotted  for  data  storage. 

c.  Display  for  viewing  quantized  video  data  in  core  or  for  displaying 
the  !,bugpaths"  on  data  that  has  been  processed. 

d.  Initialize  light  pen  input:  This  program  initializes  the  digital 
input  channel  to  read  16  coordinates  from  the  VIDIG,  average  and  display  in 
non-store  mode  the  non-zero  coordinates  in  the  list  and  makes  available  these 
coordinates  to  other  programs.  This  program  then  continues  to  run  when  the 
computer  is  idle  until  the  Reset  button  is  pushed. 

2.  Special  Purpose  Programs 

a.  Find  Bugpaths :  This  program  sorts  through  the  video  data  which 
is  read  in  on  a  frame  by  frame  basis  searching  for  possible  bugpaths  that 
link  together  in  time  and  space  for  later  calculation  of  path  dynamics. 

b.  Ensemble  averaging  of  path  dynamics  is  necessary  for  statistical 
averages  of  such  parameters  as  velocity,  changes  in  direction,  net  displacement, 
and  rotation  before  and  after  the  application  of  a  stimulus. 

c.  Evaluate  descriptor  words:  At  the  beginning  of  each  new  frame 
of  video  data,  the  VIDIG  sends  to  the  computer  a  descriptor  word  which 
contains  an  encoding  of  the  current  scan  rate  and  four  bits  of  information 
conveying  the  on-off  status  of  four  possible  stimulti  applied  to  the  biological 
organisms  under  study.  This  program  displays  this  information  on  the  display 


scope  in  a  mode  specified  by  keyboard  control, 

d.  Find  Centroids:  This  program  calculates  the  centroids  of  points 


i 


. ui 
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that  lie  within  a  user  specified  mask  about  each  coordinate  in  the  data. 

This  can  then  be  used  to  replace  the  outlines  of  organisms  vith  their  cen¬ 
troid  point  for  path  dynamics  calculations. 

3.  Mathematical  Operations 

These  operations  will  be  fixed  point  routines  to  calculate  those 
mathematical  parameters  useful  in  biological  research  within  the  defined 
data  structures. 

Double  predicate  operations :  0,0  ,  £T) 

Single  predicate  operators:  Square,  square  root,  central  difference, 
central  sum,  arctangent. 

(b)  SEL-810B  Software 

The  software  development  program  for  the  3EL-810B  which  was  outlined 
in  the  Thirteenth  Quarterly  Report  is  about  two  months  behind  the  proposed 
target  data  of  June  31,  1970.  As  of  this  writing  Phase  II  of  the  software 
development  is  nearing  completion  and  it  is  anticipated  that  Phase  II  should 
be  operational  early  in  August. 

Following  the  completion  of  Phase  II  the  speech  system  software  (Phase  III) 
will  be  added  to  the  SFL-810B  system.  Since  the  speech  analysis /synthesis 
software  is  now  well  defined  a  fully  operational  speech  analysis/synthesis 
system  should  be  available  on  the  SEL-810B  by  the  end  of  the  summer.  The 
present  status  of  the  SEL  software  system  is  summarized  in  the  following 
paragraphs . 

The  Machine  Language  Disk  Controller 

A  routine  was  written  in  SEL  810  machine  language  for  the  purpose  of 
loading  and  storing  programs  such  as  the  assembler  on  disk.  This  routine 
accepted  a  block  number  in  the  switch  register  and  loaded  or  stored  all 


core  starting  at  that  block;  it  was  of  utility  ir,  generating  the  SEL  8l0i 
operating  system. 

The  SEL  8103  Operating  System 

An  SEL  operating  system  was  designed  to  provide  a  set  of  basic  operators 
for  loading  and  storing  programs  on  disk,  linking  various  programs  together, 
and  executing  programs  resident  in  core.  This  routine  accepts  operator 
commands  from  the  SEL  teletype  keyboard. 

The  SEL  Mnembler 

The  SEL  Mnembler  two  pass  assembler  was  modified  to  make  use  of  the 
operating  system  routines;  the  assembler  reads  cards  from  the  card  reader, 
makes  the  two  passes  from  disk,  and  writes  relocateable  object  output  on 
disk. 

The  SEL  Reloc  teable  Loader 

The  SEL  Relocateable  Loader  was  modified  to  read  relocateable  object 
output  produced  by  the  assembler  from  disk  and  load  it  into  core. 

Construction  of  the  SEL  8lOB  On-Line  Speech  Acquisition  and  Analysis  System 

During  this  reporting  period  the  basic  controllers  for  an  SEL  8l0? 
two-station  on-line  system  have  been  developed. 

Input  Keyboard  Interrupt  Processing  Routine  and  0PC0N 

The  input  keyboard  interrupt  processing  routine  and  operational  controller 
(0PC0N)  have  been  fully  written  and  checked  out.  These  include  the  routines 
for  processing  of  console  programs  (USER)  and  repeating  buttons  (REPEAT). 

Average  button  processing  time  (overhead)  has  been  measured  to  be  90  Ksec. 

Display  Generation 

Display  generation  routines  have  been  implemented  to  provide  character 
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display  and  line  drawing.  The  character  generation  routines  provide  a  variable 
character  size.  The  ‘ vpi nr  level  operations  have  been  implemented.  The 
curvilinear  display  program  accepts  two  lists  of  integers  normalized  with 
a  scale  of  1024. 

Debugging  Level 

The  debugging  level  has  been  implemented  on  on-line  debugging  ease.  This 
provides  a  set  of  software  registers,  operations  to  evaluate  and  modify 
core,  read  and  write  disk,  and  ar.  on-line  assembler. 

lata  Str~  ture  Manipulation  Routines 

A  universal  data  structure  has  been  developed  for  loading  and  storing 
variable  length  blocks  of  data.  This  structure  is  used  to  load  disk-based 
routines  into  core,  perform  necessary  relocation,  update  and  load  user 
programs,  floating  point  lists  (Level  II  data),  speech  data,  etc.  Routines 
provided  to  manipulate  the  data  structure  are  load  item,  update  item,  and 
repack  item  (which  is  the  structure's  "garbage  collector"). 

Storage  Allocation 

A  memory  paging  scheme  has  been  devised  for  main  storage  management. 
Elements  in  the  data  structure  are  assigned  priorities  in  their  competition 
for  main  storage.  When  main  storage  is  completely  used,  the  element  with 
minimum  priority  is  purged  from  core  in  the  attempt  to  allocate  space. 

The  console  Program  Generation 

The  console  program  generation  (LIST)  routine  has  been  implemented  for 
console  program  building,  editing,  ar.d  storage. 

Assembler  for  360  75 

Ar.  assembler  for  the  DLL  8105  has  been  written  for  the  360/75-  Tbs 


purpose  is  to  provide  an  assembly  language  listing  on  the  360  line  printer 
with  optional  object  output  on  punched  cards. 

The  Numerical  Operators 

The  numerical  operators  on  the  integer  level  have  been  implemented , 

The  floating  point  arithmetic  routines  add,  subtract,  multiply,  and  negate 
have  been  written  and  checked  out.  The  floating  point  data  format  for  the 
SEL  8lOB  has  been  chosen  to  contain  32  bits  of  mantissa  and  16  bits  of 
characteristic  (scale  of  2).  The  mathematical  levels  I  and  II  (floating 
point  single  number  operations,  vector  operations  respectively)  are  currently 
being  implemented.  Additional  levels  for  speech  synthesis 'analysis,  and 
phoneme  manipulation  are  being  designed. 

(c)  SEL-81QB  Interface  to  Speech  Station 

An  on-line  keyboard  and  Tektronix  6ll  kisplay  scope  have  been  incor¬ 
porated  into  the  SEL  hardware  speech  system  and  are  presently  being  used  for 
the  completion  of  Phase  II  of  the  software  system.  Since  D'A  converters 
are  needed  to  drive  the  6ll  display  scope,  these  have  also  been  added  to 
the  system.  The  present  hardware  configuration  is  depicted  in  block  diagram 
form  in  Figure  C-49. 

The  present  hardware  configuration  is  characterized  by  the  restriction 
that  all  data  transfers  to  and  from  the  SEL  must  be  made  under  direct  program 
control  rather  than  under  the  supervision  of  either  of  the  two  available 
Block  Transfer  Control  Units  contained  within  the  SEL.  Three  devices  are 
daisj  chained  to  the  SEL  in  the  present  system.  These  are,  a  teletype,  a 
card  reader,  and  a  four  channel  I/O  multiplexor  two  of  which  are  used.  One 
channel  provides  a  data  link  between  the  SEL  and  the  KW-400  which  has  bt_en 
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Present  3EL-8l0c  hardware  configuration. 
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modified  to  serve  as  a  disk  controller  for  the  IBM  1311  disk  drive  and 
satellite.  The  other  channel  of  the  multiplexor  is  tied  to  an  on-line 
console  controller.  7hr  input  side  of  this  controller  sends  a  process 
interrupt  to  the  SEL  every  time  a  keyboard  button  is  pressed.  The  SEL 
responds  to  the  interrupt  by  executing  a  digital  input  from  the  controller 
and  thus  reading  the  button  code. 

The  output  side  of  the  on-line  console  controller  drives  a  Tektronix 
6 11  storage  scope.  10  bits  of  each  •  1  put  are  DAC  data.  One  bit  specifies 
whether  the  x  or  y  DAC  is  be  loaded.  One  bit  triggers  the  unblank  one  shot 
and  another  bit  is  for  erase. 

The  SEL-810B  hardware  system  which  is  presently  under  development  is 
shown  in  Figure  C-50.  The  design  of  the  system  has  been  completed  and  is 
currently  under  construction.  The  system  is  characterized  by  the  capability 
for  two  simultaneous  block  transfer  I/O  operations.  In  addition  the  EW  400 
Switch/d.isk  controller,  will  be  replaced  by  a  disk  controller  whose  design 
has  been  optimized  for  the  problem  of  recording  long  intervals  of  speech. 

This  device  will  facilitate  the  use  of  the  SSL  core  as  a  douDle  buffer. 
Sydirchs  ,  a  SYstem  for  Digitally  REcording  Human  Speech,  will  execute  input 
block  transfers  while  the  disk  controller  is  executing  output  to  the  1311 
disk  memory. 

There  are  two  additional  I/O  control  devices  essential  to  the  system. 

One  is  a  two  station  On-Line  Interface  and  the  second  a  Synchronous  Nanohumper 
Sampler.  This  latter  device  is  programmed  from  the  on-line  console  to 
sample  the  output  of  a  Nanohumper  at  fixed  intervals  of  time  as  prescribed 

by  the  on-line  user.  This  allows  the  Nanohumper  to  serve  as  a  conventional 

*  ’  "  '  - ‘ - - — 

Described  in  the  13th  Quarterly  Report 


A/r.  converter  with  selectable  sample  rates.  Tims  it  is  seen  that  the  system 
has  tour  essential  hardware  I/O  device  controllers:  the  Disk  Controller, 
SYUIPEHS,  the  On-Line  Interface,  and  the  Syncnror.ous  Kanohumper  Sampler. 

In  order  to  accomodate  the  lrC  devices  and  to  leave  room  for  future  expansion 
of  the  hardware  system,  a  versatile  I70  multiplexor  has  been  proposed  and 
is  presently  under  construction. 

To  the  device  side  of  the  multiplexor  sixteen  device  controllers  are 
connected.  To  the  computer  side  three  distinct  data  transfer  facilities 
are  connected.  Two  of  these  facilities  are  the  Block  Transfer  Control  units 
or  ETC's.  The  third  facility  is  the  standard  I  '0  control  set  and  the  data 
bus.  All  data  is  actually  transferred  over  the  data  bus.  However,  from 
the  viewpoint  of  an  external  device,  it  is  as  though  that  device  can  be 
tied  to  any  one  of  three  data  channels.  Any  device  may  transfer  data  via 
any  one  of  the  three  data  transfer  facilities.  A  device  is  "tied"  to  one 
of  the  BTC's  by  means  of  a  command  issued  ver  the  standard  I  ;0  control  set. 

Once  this  tie  is  made,  the  device  will  remain  tied  to  that  ETC  until 
the  entire  data  block  has  been  transferred.  Responding  to  the  device's 
data  transfer  request,  the  ETC  will  grant  each  request  according  to  priorities 
among  the  other  BTC  and  the  standard  I/O  control  set.  The  I '0  Multiplexor 
enables  the  SEL  to  transfer  data  to  or  from  any  two  devices  in  block  mode 
while  communicating  with  any  of  the  fourteen  remaining  devices  in  the  Direct 
Program  Control  mode  using  the  Standard  l/O  control  set.  (Direct  Program 
Control  means  that  a  full  I '0  instruction  is  executed  for  each  datr.  word 
transferred. } 

The  Disk  Controller  will  transfer  data  via  Block  Transfer  Control. 
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Consequently,  only  one  I/O  instruction  is  required  to  transfer  an  entire  block 
of  data.  When  a  block  transfer  begins,  the  SEL  provides  the  controller  with 
the  starting  sector  address  and  disk  surface.  These  parameters  are  auto¬ 
matically  incremented  if  the  size  of  the  data  block  requires.  At  the  end 
of  the  transfer,  a  disk  status  work  is  presented  to  the  SEL.  It  identifies 
the  final  sector  address  and  the  final  disk  surface.  This  feature  facilitates 
the  recording  of  a  full  cylinder  of  speech  data  with  minimal  program  inter¬ 
vention. 

Conclusion 

Tasks  established  in  the  original  contract  have  been  accomplished. 
Research  to  date  has  produced  reliable  end-products  which  are  being 
successfully  employed  in  computer  technology;  it  has  also  produced  other 
promising  factors  which  warrant  further  exploration  particularly  in  the 
areas  relating  to  computer  networks  and  the  communication  between  humans 
and  computers. 
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